CORPUS LINGUISTICS IN FINLAND: A RESOURCE SURVEY

373

Finnish corpus linguistics and computer linguistics generally has an ancient tradition, which gives it authority in the world community and has produced solid results in various areas. The first projects for electronic corpuses appeared in the 1960s, as in many other countries [1, 2]. From the start, this line in Finland was closely related to the writing of original computer programs for processing text, as well as close international links and devotion to current topics in lexicography and grammatical description ([3-6]; see also the round-table material on corpus linguistics in “Korpuslingvistiikan työpaja l: Korpukset ja ohjelmat”, pp. 126-134 of [7]). The major feature of computer linguistics in Finland has become the close connection with the writing of end-user products, which has included collaboration with commercial firms [8; 1, pp. 50-54 and 62-64]. This paper is of information type and has particular purposes such as giving Russian linguists a conception of the main computer linguistic resources in Finland and determining the scope for them to use them. Each existing corpus is indicated as regards position at the present time, which is reduced in some cases to indicating the place of creation and initial storage. The characteristics of each are indicated by listing the places of detailed description (on the Internet and/or as a paper publication), in which full information can be obtained. Many of the resources described below provide remote access to the files (most of the servers work under the control of the Unix OS, which in general involves the user’s machine having Unix-Client, e.g., the program FSecure SSH-Client). I do not discuss in detail the technical and organizational aspects of access and merely state that almost all of them are accessible for free use for research and teaching purposes. In most cases, this requires one to obtain permission from the administrator or owner of the corpus. Contact information is given on the corresponding Internet sites or in articles on the topic. The following comment is important. We are concerned with a definition of the corpus content. There are multiple meanings or uncertain use of this term, which lead to some general tendency for the name electronic corpus to be given to any collection of texts put into digital format. On the other hand, recently the term corpus has increasingly been used not simply for text (English running text) but linguistic material especially selected on ceratin principles. “So a corpus in modern linguistics, in contrast to being simply any body of text, might more accurately be described as a finite-sized body of machine-readable text, sampled in order to be maximally representative of the language variety under consideration” [9, p. 24]. However, in spite of the expansion of the new approach, old corpuses (i.e., simply electronic texts) still retain their linguistic value in many areas. This is dependent on the substantial differences in quantity and quality of the work done. For example, the last number of text collections in English poses substantially more complicated tasks (various types of annotation, parallel corpuses, speech records presented in electronic form, and so on). On the other hand, in many modern languages there are as yet no simple well-balanced representative corpuses, quite apart from annotated ones. Special and equally difficult problems arise for the creation of any corpus of ancient texts.

DOWNLOAD PROJECT

CORPUS LINGUISTICS IN FINLAND: A RESOURCE SURVEY

Related

PLACE YOUR ADVERT HERE

DEPARTMENTS

LATEST PROJECTS

STUDIES ON SOME ASPECTS OF ANTHRACNOSE-BLIGHT-DIEBACK COMPLEX OF CULTIVARS OF GRAPEVINES (VITIS SPP.) IN...

GENETIC VARIABILITY STUDIES OF TWENTY POTATO GENOTYPES

RELATIONSHIP OF HAEMOGLOBIN AND POTASSIUM POLYMORPHISM WITH CONFORMATION, MILK PRODUCTION AND BLOOD BIOCHEMICAL PROFILES...

ADOPTION OF AGRICULTURAL INNOVATIONS AMONG MEMBERS AND NON-MEMBERS OF WOMEN CO-OPERATIVE SOCIETIES IN OJU...

SMALL FARMER CREDIT WITH PARTICULAR REFERENCE TO NIGERIA

DISCLAIMER

EDITOR PICKS

STUDIES ON SOME ASPECTS OF ANTHRACNOSE-BLIGHT-DIEBACK COMPLEX OF CULTIVARS OF GRAPEVINES...

GENETIC VARIABILITY STUDIES OF TWENTY POTATO GENOTYPES

RELATIONSHIP OF HAEMOGLOBIN AND POTASSIUM POLYMORPHISM WITH CONFORMATION, MILK PRODUCTION AND...

POPULAR POSTS

Accounting project topics

CIVIL SERVICE IN NIGERIA

TOP 5 BEST TRUSTED RESEARCH PROJECT TOPICS AND MATERIALS WEBSITE IN...

POPULAR CATEGORY

DESIGN AND IMPLEMENTATION OF AN ORTHOPEDIC INFORMATION SYSTEM A CASE STUDY OF ORTHOPEDIC HOSPITAL...

ONLINE INVESTMENT PROGRAMS AND ITS IMPACT ON THE WELL BEING OF UNIVERSITY OF BENIN...

Integrated water management research in Mexico: Opportunity for North American collaboration