1 Introduction As commonly recognized, the era of modern corpus linguistics is approaching the half-century mark. During the past 50 years, we have witnessed a series of important landmark events in this field, ranging from the early attempts at mechanolinguistics by Juiland and Busa in the 1950s, to the pioneering work on computerized corpora in the 1960s and 1970s, involving first written material in terms of the Brown Corpus and the LOB Corpus, and later spoken material in connection with the London-Lund Corpus; in the 1980s, we have experienced the large-scale corpus projects of Cobuild and the Having now entered the 21 st century, it is clear that there are new challenges ahead for the corpus linguist. In terms of standard corpora, for example, we know that the American National Corpus (ANC) is under development, a parallel to the BNC with 100 million words of transatlantic English (e.g. Ide et al. 2002), and there is also a great deal of work going on with sophisticated varieties of learner corpora and multilingual (parallel) corpora (e.g. Botley et al. 2000; Granger 2004). However, the biggest challenge of today is undoubtedly the growing body of text-based information available on the World Wide Web (henceforth the Web). While originally intended as a pure information source only, this material forms in fact the largest store of textual data in existence, and as such it constitutes a tantalizing resource for various linguistic purposes. Let us look at some initial figures. As regards the size of the material on the Web, a rough estimate indicates that there are currently (December 2004) about eight billion Web pages available (cf. containing perhaps as much as 50 terabytes of text: at a generous average of 10 bytes per word (cf. Kilgarriff and Grefenstette 2003), these figures suggest a body of no less than five trillion (5 000 billion) words in one form or another. 26 Out of this massive multilingual collection of texts and text fragments, it appears that about two thirds are written in English (e.g. Xu 2000), although the proportion of non-English texts seems to have increased in recent years (e.g. Grefen-stette and Nioche 2000). This means that there is probably something in the range of 3 000 billion words of English to be found on the Web, forming a virtual English supercorpus ready for use by enterprising linguists in all manner of language research (cf. Bergh et.
PLACE YOUR ADVERT HERE
- ACCOUNTING PROJECT TOPICS AND MATERIALS3553
- EDUCATION PROJECT TOPICS AND MATERIALS3486
- ENGLISH AND LINGUISTIC PROJECT TOPICS AND MATERIALS2939
- COMPUTER SCIENCE PROJECT TOPICS AND MATERIALS FINAL YEAR1274
- BANKING AND FINANCE PROJECT TOPICS AND MATERIALS1250
- BUSINESS ADMINISTRATION PROJECT TOPICS AND MATERIALS1236
- EDUCATION FOUNDATION GUIDANCE AND COUNSELLING TOPICS AND MATERIALS1045
- ZOOLOGY PROJECT TOPICS AND MATERIALS1002
- MASS COMMUNICATION PROJECT TOPICS AND MATERIALS1001
- ANIMAL SCIENCE PROJECT TOPICS AND MATERIALS978
- LAW PROJECT TOPICS AND MATERIALS896
- ARTS EDUCATION PROJECT TOPICS AND MATERIALS844
- MARKETING PROJECT TOPICS AND MATERIALS690
- AGRICULTURAL EXTENSION PROJECT TOPICS AND MATERIALS676
- PUBLIC ADMINISTRATION PROJECT TOPICS AND MATERIALS654
LATEST PROJECTS
STUDIES ON SOME ASPECTS OF ANTHRACNOSE-BLIGHT-DIEBACK COMPLEX OF CULTIVARS OF GRAPEVINES (VITIS SPP.) IN...
GENETIC VARIABILITY STUDIES OF TWENTY POTATO GENOTYPES
RELATIONSHIP OF HAEMOGLOBIN AND POTASSIUM POLYMORPHISM WITH CONFORMATION, MILK PRODUCTION AND BLOOD BIOCHEMICAL PROFILES...
ADOPTION OF AGRICULTURAL INNOVATIONS AMONG MEMBERS AND NON-MEMBERS OF WOMEN CO-OPERATIVE SOCIETIES IN OJU...
SMALL FARMER CREDIT WITH PARTICULAR REFERENCE TO NIGERIA
DISCLAIMER
All undertaking works, records and reports posted on this website, modishproject.com are the property/copyright of their individual proprietors. They are for research reference/direction purposes and the works are publicly supported. Do not present another person’s work as your own to maintain a strategic distance from counterfeiting its results. Use it as a guide and not to duplicate the work in exactly the same words (verbatim). modishproject.com is a vault of exploration works simply like academia.edu, researchgate.net, scribd.com, docsity.com, coursehero and numerous different stages where clients transfer works. The paid membership on modishproject.com is a method by which the site is kept up to help Open Education. In the event that you see your work posted here, and you need it to be eliminated/credited, it would be ideal if you call us on +2348053692035 or send us a mail along with the web address linked to the work, to [email protected]. We will answer to and honor each solicitation. Kindly note notification it might take up to 24 - 48 hours to handle your solicitation.