Spoken word collections promise access to unique and compelling content, and most of the technology needed to realize that promise is now in place. Decreasing storage costs, increasing network capacity, and the availability of software to encode and exchange digital audio make possible physical access to spoken word collections at a previously unimaginable scale. Effective support for intellectual access — the problem of finding what you are looking for — is much more challenging, however. In this talk I will briefly describe work that has been done on this problem at the Text Retrieval Conferences, the Topic Detection and Tracking evaluations, and in individual research projects around the world. I will then describe a unique resource, a collection of 116,000 hours of oral history interviews recorded in 32 languages in 57 countries that has been assembled by the Survivors of the Shoah Visual History Foundation. Nearly 10,000 hours of this audio has been manually segmented, summarized and indexed, making this an unrivaled resource with which we can explore a broad array of data-driven techniques. My main focus will be to explain how we are leveraging this exceptional resource to develop the ability to index similar materials automatically. The project we call MALACH (Multilingual Access to Large spoken ArCHives) builds on a long heritage of increasingly demanding applications for speech recognition technology. The accented, emotional and elderly speech in the Shoah Foundation’s collection are so challenging that state-of-the-art systems initially yielded a 90% word error rate! We now have speech recognition systems that achieve better than half that error rate for two languages, English and Czech. That’s nowhere near good enough to produce readable transcripts, but it is approaching a point where other language technologies can begin to make headway. I’ll illustrate that point with our latest results from across the project on speech recognition, natural language processing components, and information retrieval system design. The scope of this one project is breathtaking, directly involving nine research teams from six institutions on two continents (Charles University, IBM T.J. Watson Research Lab, Johns Hopkins University, the Shoah Foundation, the University of Maryland, and the University of West Bohemia), with interests that range from the information needs of historians to the modeling of Czech colloquial pronunciation. Virtually every topic in computational linguistics finds expression in that range. We plan to ultimately build speech recognition systems in at least five languages (adding Russian, Polish and Slovak to what we have now), so morphology and language modeling are critical issues. The diverse range of languages in the collection makeÂ
PLACE YOUR ADVERT HERE
- ACCOUNTING PROJECT TOPICS AND MATERIALS3553
- EDUCATION PROJECT TOPICS AND MATERIALS3486
- ENGLISH AND LINGUISTIC PROJECT TOPICS AND MATERIALS2939
- COMPUTER SCIENCE PROJECT TOPICS AND MATERIALS FINAL YEAR1274
- BANKING AND FINANCE PROJECT TOPICS AND MATERIALS1250
- BUSINESS ADMINISTRATION PROJECT TOPICS AND MATERIALS1236
- EDUCATION FOUNDATION GUIDANCE AND COUNSELLING TOPICS AND MATERIALS1045
- ZOOLOGY PROJECT TOPICS AND MATERIALS1002
- MASS COMMUNICATION PROJECT TOPICS AND MATERIALS1001
- ANIMAL SCIENCE PROJECT TOPICS AND MATERIALS978
- LAW PROJECT TOPICS AND MATERIALS896
- ARTS EDUCATION PROJECT TOPICS AND MATERIALS844
- MARKETING PROJECT TOPICS AND MATERIALS690
- AGRICULTURAL EXTENSION PROJECT TOPICS AND MATERIALS676
- PUBLIC ADMINISTRATION PROJECT TOPICS AND MATERIALS654
LATEST PROJECTS
STUDIES ON SOME ASPECTS OF ANTHRACNOSE-BLIGHT-DIEBACK COMPLEX OF CULTIVARS OF GRAPEVINES (VITIS SPP.) IN...
GENETIC VARIABILITY STUDIES OF TWENTY POTATO GENOTYPES
RELATIONSHIP OF HAEMOGLOBIN AND POTASSIUM POLYMORPHISM WITH CONFORMATION, MILK PRODUCTION AND BLOOD BIOCHEMICAL PROFILES...
ADOPTION OF AGRICULTURAL INNOVATIONS AMONG MEMBERS AND NON-MEMBERS OF WOMEN CO-OPERATIVE SOCIETIES IN OJU...
SMALL FARMER CREDIT WITH PARTICULAR REFERENCE TO NIGERIA
DISCLAIMER
All undertaking works, records and reports posted on this website, modishproject.com are the property/copyright of their individual proprietors. They are for research reference/direction purposes and the works are publicly supported. Do not present another person’s work as your own to maintain a strategic distance from counterfeiting its results. Use it as a guide and not to duplicate the work in exactly the same words (verbatim). modishproject.com is a vault of exploration works simply like academia.edu, researchgate.net, scribd.com, docsity.com, coursehero and numerous different stages where clients transfer works. The paid membership on modishproject.com is a method by which the site is kept up to help Open Education. In the event that you see your work posted here, and you need it to be eliminated/credited, it would be ideal if you call us on +2348053692035 or send us a mail along with the web address linked to the work, to [email protected]. We will answer to and honor each solicitation. Kindly note notification it might take up to 24 - 48 hours to handle your solicitation.