Often people are shocked by the detail of the Negro League statistical compilations I do, especially when fielding statistics or relatively minor categories like sacrifice hits or hit batsmen are included. This comes mainly from two sources: 1) a basic unawareness (completely understandable) of the nature of baseball journalism, particularly the box score, in the earlier twentieth century; and 2) the legend of the Negro Leagues, which paints them as half-mythical enterprises that took place mostly in the realm of tall tales—an image I will have much more to say about in future posts.
I was first drawn into Negro League research when, on a whim, I looked up a few African-American newspapers, and found that there was vastly more material there than I had dreamed possible after reading Robert W. Peterson, John Holway, James Riley, and others, most of whom stressed the poverty and incompleteness of “objective” statistical data, or claimed that it was completely unknowable. When I followed up by looking into some mainstream daily papers, and realized that in the 1920s and before Negro League games were regularly reported in many of them, with box scores that generally surpassed the quality of the boxes in the black weeklies, it really cinched the deal. Realizing that you could count how many wild pitches Bullet Rogan was charged with, or how many times Jud Wilson was hit by a pitch, or that you could actually compile fielding statistics (thus range factors, fielding percentages, etc.) for the reputedly slick-gloved Dick Lundy, fired the imagination far more than any endlessly repeated anecdote or silly tall tale about how fast Cool Papa Bell was.
Anyway—I thought it would be interesting for those who’ve never actually seen Negro League box scores to occasionally post some scans here, maybe talking a little about the issues inherent in analyzing them. Since I just posted 1916 statistics, here’s a sample of box scores from that year.
First, from the St. Louis Globe-Democrat, a (white) daily paper, which, for every year I’ve researched in the 1910s and 1920s, featured generally excellent box scores of Giants and Stars games, here’s the box score for a July 27, 1916, game between the St. Louis Giants and (western) Cuban Stars:
Sometimes the boxes are a little approximate about innings pitched, which means you can’t in every instance follow what they say precisely. In the account of this game (which I couldn’t include in the image, due to limitations of my editing program), it is said that Pedroso was driven from the mound in the sixth. Together with the notation that Padrón (this would be Juan, the American lefty, not Luis, the Cuban) pitched 2 1/3 innings, this led me to credit Pedroso with 5 2/3 innings pitched, not six, as the box score gives.
Here’s a game account and box score from the Indianapolis Freeman (8-19-1916), for an August 13 game between the Lincoln Giants and (eastern) Cuban Stars (in separate images):
Note the difference between the line score, which gives the Lincolns 8 runs, and the table, which gives them 10 in the totals line—but the individual players’ runs scored only add up to 9! Typical box score fun and games. In this case the discrepancy originated with Cyclone Joe Williams’s walkoff three-run home run. As the Chicago Defender’s account of the game noted, “As only one run was needed to win the game the hit went for a single.” Plus the official score would have been only 8 to 7, and Williams and the other runner would not be credited with runs scored. This was the rule at the time in the major leagues, and many players “lost” home runs as a result, Babe Ruth among them (he really hit 715 home runs).
I have unapologetically played the revisionist in cases like this (this is actually the only one I’ve confronted so far, that I can remember), for these reasons:
1) There was obviously some disagreement about it at the time, as even then the rule was counter-intuitive: thus the confusion in the Freeman’s runs scored column.
2) There was no governing body to rule on such questions, since no Negro League actually existed. There was no “official” score, only the numbers and rulings the newspapers came up with; and they were simply (albeit unevenly) applying common practice.
3) Most importantly, the whole point of recovering Negro League statistics is to characterize as accurately as possible what happened; and Williams hit a three-run home run, not a single. Especially when you consider the smallness of the samples involved, taking away his home run could result in his achievements and abilities being badly misunderstood.
Lastly, I thought I’d post a sample of one of the play-by-play accounts published in both the Defender and Freeman, just to give you an idea of what they looked like. This is from an account of an American Giants/ABCs game in the September 2, 1916, Defender:
Scott, thanks again. Yeah, I don't want to come down too hard on those guys (Holway, Riley, & co.), because of the role they played in keeping the Negro Leagues alive in all the ways you point out. At the same time, there's no question that the statistical work has left much to be desired.
I'm going to write a lot about the issues you raise in this blog over the next few weeks or months, so I'll try to answer fairly briefly. I think compiling stats was difficult for the first generation of NeL historians for these reasons:
1) They were/are pretty much amateur historians, not academics, which can make access to materials more difficult;
2) The NeL historians started out before the age of PCs, and I think most NeL statistical work was done by hand until very recently;
3) There are major conceptual differences between the way I or Patrick Rock, say, look at statistics, and the way first-generation NeL historians do. These differences can be summed up by the word I use in the blog's subtitle: "sabermetric." Not a word I necessarily like, but it captures the difference in world view.
Just as an example, where more traditional NeL historians will try to find out how many home runs Josh Gibson hit, I'm more likely to want to find out everything I can about, say, the 1922 Negro National League. Context-- league totals, park effects, unbalanced schedules, etc.--is everything, IMO. Also, like more recent sabermetric analysts (James himself, the Baseball Prospectus guys, and others), I find fielding statistics to be meaningful and necessary to a complete account of the game.
As for why there's not more statistical work out there--I dunno. Check out Patrick Rock's work, if you haven't already. I think part of it is just that it's damned hard to do and doesn't pay anything, so very few people are stupid enough to try!
Posted by: Gary Ashwill | May 18, 2006 at 09:57 PM
Once again, great stuff Gary. Let's face it, without Holway, Peterson and Clark, you probably wouldn't have this blog (well, maybe you would, but it might be about Melville or Aimee Mann, and I wouldn't be reading it). They've blazed the trail the past four decades, scrambling around in a race against time, tracking down former NeL'ers to gather their personal accounts and anectodal information before its too late. Creating the "myths" and hyperbole that hooked our interests in the first place. They're to be commended for their efforts. But where we're at today (in terms of the current state of statistical research) still begs the question (and I hope you'll take a stab at answering it): With the SABR Negro League committee, with people like Holway, Clark, Riley and the late Peterson, with a research grant from MLB/Hall of Fame (the mediocre "Shades of Glory" book by Hogan and forthcoming? NeL Encyclopedia on the way), with the internet and microfilm and microsoft excel and dozens if not hundreds of individuals passionate about the subject of Negro League Baseball- How come YOUR work on 1916, 1921 and the 1927/28 Cuban League seasons, far exceeds the published efforts of the army of individuals behind the "established" Negro League research community?
How can one man (in a span of just a few years), blow their work away? I'm befuddled here in the Chicago suburbs.
Posted by: Scott Simkus | May 18, 2006 at 01:55 PM