18 Responses to “Joe DiMaggio and Probability”

  1. Chris Meyer March 30, 2008 at 9:12 am Permalink

    The article is wonderful–Stuart Kauffman meets Nassim Taleb in a universe of baseball diamonds.
    But it’s not clear to me why you chose the game rather than the at-bat as the unit of analysis–please explain, or point to a paper.


  2. Sam Arbesman March 30, 2008 at 11:03 am Permalink

    Glad you enjoyed the article. We chose the probability of a hit in each game rather than the probability of a hit for each plate appearance because the record was number of games in a row with a hit. DiMaggio could still strike out occasionally, but as long as he got a hit at least once in a game, it counted. So that was what we tried to replicate in our calculations. Thanks.

  3. Neal Canon March 30, 2008 at 11:50 am Permalink

    I saw your article in the New York Times this morning, and I was just wondering if you took into account the psychology of having a significant hitting streak going. I think that the pressure from the media, teammates, oneself, etc. to extend the hitting streak could adversely affect the hitter’s performance, making it less likely for him to extend the streak as the streak gets longer and longer. There may be an inverse relationship between the length of the streak and a player’s batting average, where as your study assumes that there is no relation between the two. (Your study used the player’s batting average over the season to calculate the probability of getting a hit in a given game). To accurately predict the likelihood of a streak, this relationship between hitting streak and batting average would need to be taken into account (if it exists).

  4. Josh March 30, 2008 at 12:14 pm Permalink

    I enjoyed the piece too. Are you planning on posting a full paper or further detail? I’m curious who some of the contemporary players posting long streaks are.

  5. Sam Arbesman March 30, 2008 at 12:43 pm Permalink

    @Neal: You make a good point, and one that can be looked at, in a similar way to how the phenomenon of the Hot Hand has been examined.

    @Josh: We are hoping to publish a paper with more details. Until then, a quick glance of the contemporary players with long streaks in our simulations reveals Ichiro Suzuki and Coco Crisp, among others.

    Thanks so much for the comments.

  6. JSE March 30, 2008 at 1:38 pm Permalink

    Very nice work. I blogged about this over at Quomodocumque.

    The number I’d like to see: in how many of the 10,000 runs was there a season _after_ 1941 with a hitting streak of at least 56 games?

  7. Sam Arbesman March 30, 2008 at 4:32 pm Permalink

    @JSE: Glad you enjoyed the article. We were able to tabulate the number of runs that occurred after 1941 with at least 56 games. It’s 301. So, if past performance is indicative of future results (which need not be true), it seems that we would have to wait many, many more years for there to be a good chance of a long streak to happen again.

  8. Mark Eisner March 30, 2008 at 9:55 pm Permalink

    In addition to presenting interesting results about baseball, you have provided a great exposition of the ideas involved in monte carlo simulation for the lay reader.

    Have you taken any simulation (or other) courses in Operations Research? I hope so, since that would give me a ‘hook’ to do a news article about your work on the http://www.orie.cornell.edu web site. Eventually I’ll set up a blog there but in the meantime the news section is the best vehicle I have.

    Mark Eisner

  9. Dave March 31, 2008 at 2:01 pm Permalink

    Loved the article, thanks. Given the data it seems highly likely that Joe D was more than an 81% weighted coin…my conclusion: Joe got hot! (is there such a thing?) Did you know that Joe D also holds the minor league record with a 60 game hitting streak — I wonder how many universes it would take for that would recur!

  10. Grant Sterling March 31, 2008 at 3:20 pm Permalink

    Did you take into account that the number of at-bats varies from game to game? If Joe gets 4 at bats every game and Joanna alternates between getting 6 in one game and 2 in the next, they may end their seasons with the same number of games, at bats, and hits, but Joanna would be substantially less likely to have a long hitting streak.

  11. Chris March 31, 2008 at 3:36 pm Permalink

    I loved the article but worry about the statistics used. In particular, using a fixed unconditional probability of hitting in a game will tend to overestimate the likelihood of a streak given variation in actual performance. One could analyze this to death, but let us focus narrowly on overall team quality, perhaps measured by win-loss percentage or average runs earned against the opposing team.

    For example, consider the effect of playing easy or hard teams. Let a batter hit successfully in 60% of games played, but in truth let those odds change t0 80% success against easy teams (like the Devil Rays) and 40% success against hard teams (like the Red Sox).

    In this example, consider the probability of hitting a two-game hitting streak in a series of games, one against the easy team and the other against the hard team. Unconditionally, the probability is .6 x .6, or 36%. Conditioned on the identity of the teams, the probability is .8 x .4, or 32%. This four percentage point fall reflects a 11% drop in the probabilty of hitting the streak. Back-of-the-envelope calculations suggest this effect grows as batter quality grows and team quality disparity grows. Translated back into baseball terms, the probabilities given in the article are most overstated for the best batters playing in times of great league disparity.

    Table 1 thus overestimates the expected length of hitting streaks; the true distribution probably has the same appearance but would be shifted to the left. Table 2 thus overestimates the expected number of streaks in eras of great league disparity; those big 1890s spikes are probably lower, as are the lesser spikes in the 1990s and 2000s. The conclusion that DiMaggio’s 1941 hitting streak was anomalous in timing is strengthened, while the conclusion that his streak was not anomalous in length is weakened.

    I am interested in learning the size of this effect and its magnitude on the authors’ calculations.

  12. Sam Arbesman April 1, 2008 at 5:28 pm Permalink

    @Grant and Chris: you both bring up very good points. These are issues that can certainly be addressed and have effects that are likely to be important. Thanks for bringing up these factors!

  13. Eric April 2, 2008 at 12:48 pm Permalink


    Your results are very interesting and I think there are some interesting conclusions to be drawn that are not in the article. From an earlier response it seems that there was/is a 3% chance each year since 1941 that the record would be broken. However, it has not, and no one has even come close. Pete Rose had a 44 game streak in 1978 and Paul Molitor had a 39 game streak in 1987. While your model certainly does not rule this possibility out, it certainly seems to suggest that it is highly unlikely. Is it just coincidence or is something about the model overestimating the probability? Likewise, your model seems to get the trend correct in that long hitting streaks were more likely in the late 1800’s and early 1900’s. If you look at a list of the longest hitting streaks you will find that most did indeed occur in this time frame (1). However, your model seems to again way over-predict the likelihood of a long hitting streak. Thus, either those very knowledgeable of the game and who think DiMaggio’s streak is unlikely to ever be bested are wrong or something is fundamentally wrong with your model.

    My opinion is the latter as I cannot imagine people taking 33:1 odds of someone breaking the streak this (or any other) season. I would certainly want a much better return on my bet and I think the bookies would be happy if bettors believed your analysis. Obviously, all models are incorrect to a certain extent but I think your results suggest that your model is much too simple. While it’s always good to start with the simplest model to see how it does, when it does not reproduce known results (or reality) it is important to think about the assumptions used. Obviously, others here have suggested some improvements. However, I think the fundamental flaw is that variability in pitching quality a batter sees over the course of a season did not seem to be included. For a hitting streak, getting two hits off a bad pitcher one day does not overcome going hitless against a better pitcher the next day. A much great distribution in pitching quality in the early days of baseball might explain why your model suggests more and longer streaks should likely have occurred in these days. Is there data on batting average against pitchers that you could include?

    (1) Although the rules of baseball did not really stabilize until around 1900.

  14. Stuart Rojstaczer April 2, 2008 at 1:57 pm Permalink

    Sorry to say, this analysis is not at all convincing. The basic problem is that your model, a probabilistic interpretation of a physical event, is uncalibrated. Uncalibrated statistical models are seldom, if ever, useful.

    There have been hundreds of thousands of games already played in MLB. That is far more than enough empirical data to look at the likelihood of hitting streaks of a certain magnitude. And your modeling data are far, far off the empirical data in terms of frequency of 20 game hitting streaks, 30 game hitting streaks, etc.

    This is flat out not good modeling. It certainly would not pass peer review if you tried to do the same type of work on a science problem. Sorry to give you the bad news.

    Another way of looking at the data is to simply look at recurrence intervals of baseball hitting streaks. Here are that data to one significant digit:

    Streak (games) Recurrence interval (years)
    20-24 0.3
    24-29 1
    30-34 4
    35-39 20
    40-44 30
    45-49 100

    Do a simple plot on log paper and you’ll find a nice straight line for these recurrence intervals. Extrapolate to Dimaggio and you get a recurrence interval of about 1000 years. I’ve done that at fortyquestions.blogspot.com for what it’s worth.

    You need to calibrate your model first. Then you can start talking about just how likely another Dimaggio event will come along.

    As an aside, analyses like these irk me in a funny way because you’re not talking about some abstract thing. You’re talking about the storied achievements of a real human being. And my view is that your analysis is not only wrong, but also denigrates the individual. I seriously think you should consider retracting your article. And I think you owe the Dimaggio family an apology. Just an opinion.

  15. Chris April 3, 2008 at 2:47 pm Permalink

    I want to follow up on the recent posts on this blog. I think we can draw a few conclusions and point in the direction of future work.

    1. The original NYT article, though well intentioned, missed the mark and overstated (perhaps quite seriously) the potential for baseball players to hit long hitting streaks.

    2. Numerous pieces of evidence support point 1, not the least of which is the fact that the implied probabilities would fail to clear competitive gambling markets (see Eric’s post) and that such long hitting streaks do not appear (or come anywhere close to) actual hitting streaks (see Stuart’s post).

    3. As an aside, Stuart also presents good evidence that baseball hitting streaks may be characterized by a rank-size distribution (power rule) in the vein of Zipf’s Law. (In fairness, I do not wish to put words in Stuart’s mouth. He never said “power rule”. I hope he would agree that the pattern in the data he showed us mimics a power rule remarkably well and that the power rule ideas reinforces his earlier argument.) For the uninitiated to come quickly up to speed, see the Wikipedia articles on “Zipf’s Law” and “rank-size distribution”.

    4. It would advance science and our collective understanding of applied mathematics to understand *why* the article missed the mark. To that end, I have proposed a potential causal mechanism, namely a failure to appreciate that diversity in hitting probabilities cannot be casually summed up by an arithmetic mean of observed data, and have presented small numerical examples to suggest the effect of this error may be substantial. Grant has suggested that variations in at-bats may be such a source, Eric has suggested variation in pitcher quality, and I have suggested variation in team quality. I could imagine any of the above being important and would like (from a modeling perspective) to see as sparse a model as possible to capture the problem and appropriately align the model’s predictions with real-life data.

    5. These exercises are quite difficult to follow up without access to relevant data and code. Is there some way to make at least some of this information available to assist in this work?

    Thank you again. I am happy to see this thread has maintained an elevated and sober discussion of the relevant issues.

  16. Sam Arbesman April 3, 2008 at 5:38 pm Permalink

    Everyone’s comments have been great; thanks so much.

    You are all correct that the model is quite simple and that allowing for variation in probabilities of getting a hit in a game will lower the chances of having long streaks. This is the difference between arithmetic means and geometric means. Carl Bialik, who writes the Numbers Guy blog at the WSJ, discusses this in today’s post, In Defense of Joe DiMaggio.

    @Chris: My source for the data is Sean Lahman’s Baseball Archive. Please feel free to play with it.

  17. Stuart Rojstaczer April 4, 2008 at 10:55 am Permalink

    I agree with Chris’s remarks. I didn’t use the words “power rule,” but that’s exactly what appears to be happening. And it isn’t surprising that it does. I avoided the use of that phrase and avoided writing down the best fit equation because I didn’t want to scare people away. But you’re not putting words in my mouth. They are exactly right.

    For me, an interesting question to look at is just why using fine scale probabilities based on batting averages don’t scale up to the observed behavior of hitting streaks. Scaling up is almost always a problem in modeling of real systems. The question is can you identify in an interesting way what’s missing with the approach used here so far?

  18. investorama February 12, 2009 at 6:18 am Permalink

    Your blog insires me!!! Thanks!