The DARPA TIPSTER project

0
619

This note is the first of four papers in this issue describing the ongoing work connected with the DARPA TIPSTER Project. The note provides an overview of the project, and the next papers by three of the contractors involved in the project provide some details on the systems involved, and some of the initial results.The TIPSTER project is sponsored by the Software and Intelligent Systems Technology Office of the Defense Advanced Research Projects Agency (DARPA/SISTO) in an effort to significantly advance the state of the art in effective document detection (information retrieval) and data extraction from large, real-world data collections. The first two-year phase of the program is concerned with the development of algorithms for document retrieval, document routing, and data extraction that are both domain and language independent. A call for proposals was made in June of 1990, and contracts for the six participating groups were let in the fall of 1991. Three meetings have been held so far, with the first results presented in September of 1992.There are two separate, but connected parts of TIPSTER. The first part of the project, document detection, is concerned with retrieving relevant documents” from very large (3 gigabyte) collections of documents, both in a routing environment, and in an adhoc retrieval environment. The routing environment is similar to the document filtering or profile searches currently done in libraries, where a query topic is constant, and the documents are viewed as the incoming stream of publications. The adhoc part of the project is similar to the standard search done against static collections.The second part of the TIPSTER project is concerned with data extraction. Here it is assumed that there is a much smaller set of documents, presumed to be mostly relevant to a topic, and the goal is to extract information to fill a database. This database could then be used for many applications, such as question-answering systems, report writing, or data analysis. The data extraction part of TIPSTER is being done by groups using natural language understanding techniques, and this part will not be described in this issue.