The Streak

If you’ve read my post about the past 10 years of Home Runs (and you should!) then you know that I’m a big fan of baseball! And currently we are in the midst of one of the great feats accomplished 75 years ago. The number 56 means more to the serious baseball fan than the average fan because that is the current record for consecutive games with a hit.

The Yankee Clipper – Joltin’ Joe DiMaggio safely hit in 56 consecutive games between May 15th and July 16 of 1941.

Let’s take a moment to let that record set in for moment. He hit in 56 games…in a row. The last player to get that close to Joe’s record was the Hit King, Pete Rose in 1978 with 44 straight games. Even more recently, two players THIS season have broken the 20 game threshold. What’s more, those two players are on the Boston Red Sox:

In fact, the last player to break the 30 game threshold was Albert Pujols in 2003 with 30.

But enough about the present, back to Joe. Baseball was a completely different game back in those days. Pitchers pitched complete games more often than not. The pace of play was much faster, probably due in part to the fact that the batters didn’t wear batting gloves, among other things.

When the streak starting, MLB began a social media campaign to commemorate the record. They tweeted each hit and the final score of each game as it happened. They even had a staffer write-up a quasi-review of the game. They’re pretty entertaining. You can check out the complete list here.

Because baseball is the worlds most documented game there ever was, there are websites like Baseball-Reference and Retrosheet among others, that are dedicated to allowing fans to dive in and relive any player, game, or simply anything they want. This, for me as a Tableau #datarockstar, is awesome because with a bit of data mining & shaping, I could create a dataset that would allow me to build a dashboard in Tableau. And that’s just what I did.

The Data

I went to Retrosheet and got the game log files for all games played in 1941 (Yes, you can do that and for any year, really). This gave me fields such as:

  • Game Date
  • Teams
  • Final Score
  • Attendance
  • Game Time
  • Line Score (runs per inning per team)
  • Just about any aggregated level of game data you can think of.

It didn’t include a log of player by at bat per game. But that’s where Baseball-Reference came in. Here, I was able to gather data such as (per game):

  • Inning of At-bat
  • Pitcher
  • Outs
  • Runners on what base
  • At-bat outcome

After that, it was a lot of trial and error in Tableau to get everything to fit and calculate correctly.

When I started this project, I knew that I wanted to be able to look at each game and each individual at bat. Plus I wanted to see some stats over the course of the streak.

Here’s what I came up with! Click here to view the interactive version.

The Guts

There’s not really any advance Tableau tricks in here to really speak of. However, the one thing I wanted to point out regarding this dashboard is that it’s all 100% floating. There’s no tiled objects. To the average joe, this means nothing. But to the seasoned Tableau user, you know what that means. It means countless time adjusting objects on a pixel by pixel adjustment. Then publishing to Tableau Public only to find out that it doesn’t match what you see on desktop, so you tweak some more, republish, tweak some more…you get the idea.

On a personal note, I had a blast making this viz and learning more about the streak than I already knew. For example, did you know:

  • The Streak took place in 1941, the exact same year that another famous baseball record was taking place? The other number that all diehard baseball fans know is .407. Yes, DiMaggio set a record of 56 consecutive games hit in the same season that Ted Williams from Boston set a MLB season record with a Batting Average of .407. No MLB player has hit .400 for the season since then. Maybe I’ll do that next?
  • His streak was good enough to beat out Williams for Season MVP. In creating this viz, I talked to my dad a lot about these stats and stuff. we talked about how interesting it was that hitting in 56 straight games was valued higher than season BA of .407. Well in doing some research into that. My assumption is that it was much easier to hit .400 in a season. The hit streak was more impressive in the context of what had come before. Joe beat a record that had stood since 1896 at 45 games. In that same time frame (1896-1941) there were 17 instances (12 unique players) of a .400+ season.

I hope you enjoyed my dashboard and maybe learned a thing or two about the great game of baseball.

Until next time!

Download the Raw Data

Leave A Reply

Your email address will not be published. Required fields are marked *

Skip to content