Introduction to the Special Issue on Digital Humanities and Computational Linguistics

0
539

Digital Humanities (DH) seeks to support research into Humanities disciplines using digital, computational techniques. Its exact definition is discussed often and may even be the subject of interesting debate (Vanhoutte, Nyhan, and Terras 2013), but we do not need to linger too long on definitional issues. At this time, DH invites contribution from all Humanities disciplines, including those where language plays a secondary role, such as anthropology, archeology, fine (visual) arts, film studies, and musicology. These are not the most likely disciplines for computational linguists to get involved in, but linguistics and literature (studies) are also Humanities discipline, where language is central, as are history and philosophy, where language is not of central interest, but where archival material in textual form often plays a central role. There are enormous opportunities for contributions from computational linguistics (CL) from all the disciplines where language and text are important. Just as in other computational disciplines, the fundamental benefits that DH can bring to its non-computational parent disciplines are the ability to deal with large amounts of data, the speed with which analyses can be tested, assessed, and criticized, and finally, the commitment to well-codified procedures, which can better be tested, replicated and modified. All of these benefits are being realized in some projects even today. Jockers (2013) analyzes 3,500 American, Irish and English novels of the nineteenth century, exploring especially the trends in themes over this period, e.g. when the religious themes of sin and salvation were popular in the different countries. Estimating conservatively, 3,500 novels would require about 100 meters of shelf space and reading — but not yet taking notes and analyzing them — would take over ten years for a disciplined reader, reading a novel a day. As larger amounts of material become available, so too will the scope of projects such as Jocker’s. Speed is of course related to the first advantage, that of capacity, since the capacity would be pointless if analyses could not be produced promptly. Nerbonne et al. (2011) describe Gabmap, a web application for dialectology. Gabmap requires that users input dialect data in the form of a table organized into sites on the one hand and forms that vary on the other. A given cell contains an indication of which form is used at a given site. The data may be categorical such as lexical or syntactic choice, numerical such as the formant frequencies of vowels, or strings such as pronunciation transcriptions.