We’ve added this note here because the line breaks do not appear correctly at the historians.org site, seriously diminishing the readability of the piece.
It’s great to read such extraordinarily positive comments about culturomics, Anthony. We’re so pleased that you enjoyed our presentation at AHA. We agree with nearly everything you wrote.
However, we did want to take a moment to respond to a few of your more critical observations, because they relate to issues – and occasionally misconceptions – that have been widely discussed in the context of our paper, the clarification of which may be relevant to some of the bigger points you make.
As you may know, the authors of our paper include PhDs in English Literature, Chinese Literature, Psychology, Media Arts and Sciences, Computer Science, Biology, and Mathematics, as well as at least one Master’s degree in history. You rightly point out that this wide-ranging author list did not include an academic historian. You then suggest a diagnosis, writing that “historians have not established, in the eyes of many of their colleagues in the natural sciences, that they possess expert knowledge that might be valuable, or even crucial — even when a scientific project is concerned with reconstructing part of the human past.”
We disagree with this diagnosis, and suspect that it emerges from a misunderstanding about how the project was actually carried out.
Part of the issue is a latent – and widespread – assumption that the absence of an academic historian on the author list means that we did not seek out and receive significant input from academic historians.
This assumption seems to turn on questions of how multiple authorship works in the sciences, and the difference between the role of an author vs. the role of someone who appears in the acknowledgments of a paper. Part of what makes multiply-authored papers work – without trivializing the role of individual authors – is that the bar for authorship is-and-ought-to-be set quite high. Each author needs to make specific contributions to specific parts of the data collection or analysis. I (Erez) have a close scientific mentor (Michael Brenner, the chair of my PhD thesis committee) who has given me invaluable feedback and advice time and time again for over 7 years, completely changing the course of specific projects and even of my work as a whole. And yet he has never been my co-author: when I ask him to be a co-author on the resulting papers, he always says no, and reminds me that each co-author needs to make specific contributions and that he doesn’t feel that his own are specific enough to meet that bar. This is an example of great scientific integrity, but it highlights the fact that there’s a difference between the guides who make great science possible, and the people who do the work. The former are usually recognized in the acknowledgments of a paper, not on its author list. This is where I have often acknowledged Michael.
Every author on our recent paper directly contributed to either the creation of the corpus, or to the design and execution of the specific analyses we performed. No academic historians met this bar.
But it does not follow that we lacked input from historians. In fact, we sought and received extensive input from historians; the value of their “expert knowledge” was apparent to us from the get-go.
For instance, your column suggests three Harvard historians who would have been natural to contact: Michael McCormick and the MacArthur winners Robert Darnton and Ann Blair. What you may not realize is that two of the three – Michael and Bob – were both involved-with and supporters-of our work from quite early on, providing us with regular guidance and feedback throughout the lifetime of the project.
Bob Darnton in particular served as an important bellwether. As an academic historian, a MacArthur winner, Director of Harvard University Library, one of your predecessors as AHA president, and – crucially – the most outspoken academic critic of Google Books, we knew we could trust him to be both wise and skeptical in his assessment of our ongoing work and the compromises (such as releasing the data in N-gram format) that we made in order to make it a reality. We therefore approached him early on, and presented our work to him repeatedly and in great detail over the years in which the project took shape. We took his comments and critiques very seriously. He, in turn, was a major supporter of our project, even going so far as to propose brown-bag lunches at Harvard Library where we got extensive input on our work. On the whole, he has been enormously positive, and has repeatedly counseled us on the urgency of making an N-gram data release happen, both for historians and for researchers of all kinds. He also partnered with us to create an extensive – and ongoing – project using Harvard Library data.
Both he and Prof. McCormick are named in the acknowledgments of the paper.
We initially sought even greater participation by professional historians, which might have led to authorship. Early on, this was one of our highest priorities. But we found that setting up day-to-day working collaborations with historians was much harder than we expected.
Some historians were just not interested. For example, on one failed “recruiting” trip, a prominent historian (whose work we very much admire) told us that there would be little interest in our project: historians had tried quantitation in the 50s, and it hadn’t worked out.
Even when we found historians who shared our enthusiasm, there were still great barriers to working together. For instance, Michael McCormick helped us convene a multi-hour meeting with himself and about a dozen interested history students and faculty. The historians who came to the meeting were intelligent, kind, and encouraging. But they didn’t seem to have a good sense of how to wield quantitative data to answer questions, didn’t have relevant computational skills, and didn’t seem to have the time to dedicate to a big multi-author collaboration.
It’s not their fault: these things don’t appear to be taught or encouraged in history departments right now.
Ultimately, a large team was depending on us, and we had to keep moving forward.
I think a lesson that can perhaps be derived from this is that while “expert knowledge” is important, shared paradigms, a shared language, and common intellectual values are a big part of what makes a successful team come together. This suggests that history departments have to grapple with several emerging responsibilities: to encourage familiarity with quantitative methods, with computational techniques, and – as you so eloquently wrote – with large-scale collaboration.
We also wanted to expand briefly on our proposal to digitize all the world’s pre-1900 texts by 2020.
Firstly, decipherment need not precede digitization. Minimally, digitization requires us to take a high-quality picture of a text. Even better would be a digital character stream. Creating this does not require us to decipher a text, only to figure out the set of characters that the text tends to use. For instance, the Voynich manuscript has been digitized, despite the fact that no one knows what it means and that it may not mean anything at all.
Second, we recognized that this would be extraordinarily difficult to do by 2020. But “impossible” is far too strong a word. The human genome project, with a price tag of $3B, and the Large Hadron Collider, at $9B, are examples of collaborative intellectual projects at scales unthinkable in the humanities today. We believe the case for the digital archiving of recorded history warrants comparable outlays. Our very deliberate purpose in proposing such an extraordinarily ambitious goal was to urge our listeners to think about what ‘Big Humanities’ could mean for us all. We don’t know whether a motivated nucleus of scholars could lobby the governments and philanthropists of the world effectively enough to achieve such a dream. But if our dreams are big enough, even our failures will change the world.
Erez Lieberman Aiden