Ballpark figures: Analyzing MLB baseball attendance
It is springtime in the U.S., which suggests anything as American as apple pie is again: baseball. And due to the fact there is all kinds of fantastic facts close to one particular of the country’s excellent pastimes, we made a decision for this week’s publish to appear at Important League Baseball (MLB) attendance figures from the past 20 yrs, which is printed on many internet sites which includes the 1 we utilized to get the information you’ll locate in the charts beneath: ESPN.com.
To accumulate the attendance facts from ESPN, we used Jupyter Workspaces (at present in beta in Domo) and the Python package Wonderful Soup to parse the HTML. And considering that Domo can now program code in Jupyter Workspaces to run on a standard program, you can be confident that this page will keep on to update with the 2022 information.
The 1st matter you are going to in all probability detect when seeking at the information is that 2020 is missing. That’s since, because of to the pandemic, baseball was performed without the need of fans that 12 months. There was a bit of a return to normalcy in 2021, but it wasn’t until this time that all spectating limits were being lifted, so it will be interesting to observe how attendance rebounds (although, in comprehensive transparency, we only have the data for comprehensive yrs proper now, so we are not capturing any knowledge linked to seasonality, this sort of as how weather conditions or a team’s position in the playoff race impacts ticket revenue).
A single fantastic way to critique this info is with an outdated favourite of a lot of information experts: a box and whisker plot. The chart reveals the minimal and optimum normal attendance for each and every crew in the whiskers (the leading and base strains). I have sorted this to show the group with the highest peak attendance 12 months on the remaining, and the least expensive on the ideal:
In which the visualization gets more fascinating for me is with the box things. Each box demonstrates the place amongst 25th and 75th percentiles, which is intended to replicate how substantially a team’s attendance has swung more than the yrs. The greater containers inform me individuals groups (these as Philadelphia and Detroit) have experienced some great a long time for attendance and some not so fantastic yrs. Smaller containers (these as Boston) say that a team has been very consistent in its attendance quantities. We have also filtered the chart for pre-pandemic many years only considering the fact that 2021 (and to a lesser extent partial 2022 info) skews the data.
An different solution to knowledge how teams rank in attendance is to develop indexes of exactly where a team’s attendance stands relative to the complete MLB average—which is what we have performed right underneath. Dark blue boxes necessarily mean that a group is well earlier mentioned the typical, whilst dim orange containers necessarily mean that a team is perfectly beneath the regular. You can use the filters to glance at whatsoever league, division, crew(s), or year(s) you are intrigued in:
Lengthy-time Domo buyers may possibly be searching at these indexes and contemplating that I did some pre-calculation in a Magic ETL or a Dataset View. It’s real that accomplishing calculations on these types of whole levels usually require pre-calculation. But if I did that, it would be really hard to let for the yr filter. So, the secret is out: With Domo’s new Mounted beast modes (at present in beta), you can do Preset amount of element capabilities ideal in a beast mode. For the higher than “Index to League Avg”, this is the calculation:

You can see there are two items occurring below. First, when I have the SUM Fastened by League, then it is summing throughout all values with the similar league as the row I am on. That lets me to get that league overall we will need for the denominator of the index. Next, it is working with FILTER Permit to tell Domo that filters on Year can impression the Fixed functions. There are choices for FILTER Allow, FILTER DENY, and FILTER NONE.
Here’s just one previous example of how useful the Set with FILTER DENY can be. The bar charts below are defaulted to the New York Yankees (my boss’ most loved crew). The to start with chart is not working with Set, so when I filter for the Yankees, the Min, Max, and Median fields become meaningless due to the fact they get filtered to be the identical as the chosen staff. The 2nd chart makes use of Fixed and DENY on group identify so that the Min, Max, and Median continue to be as references to the key common, which is for the Yankees.
A person of the points I love—and also at moments find maddening—about exploring new details is that there is normally far more to check out. As I worked on this publish, I understood that it would be pretty fascinating to provide in teams’ win/loss documents as very well as information on stadium capability. But then I imagined: Let us probably help you save that for a upcoming put up.