MODERN CORPUS TECHNOLOGIES DICTIONARY COMPILATION

Authors

DOI:

https://doi.org/10.24919/2522-4565.2023.53.11

Keywords:

electronic dictionary, word matching, corpus linguistics, text corpus, corpus technologies, corpus creation

Abstract

The article examines the role of corpus linguistics as an independent field that develops and improves methods of collecting natural language phenomena, written and spoken texts, and methods of their preservation and analysis. Corpus linguistics is essential because it contributes to optimizing the epistemic function related to the preservation and transmission of knowledge and the reflection of national self-awareness. The authors note that corpus linguistics research mainly studies issues of the theory and practice of creating corpora, corpus typology, structuring and principles of selection of basic units, and language learning using corpus methods. The article reveals the object, subject, features, and purpose of corpus linguistics, which consists of implementing an objective linguistic description of the language system based on the study of human communication and the stages and features of its development. It was found that the basic concept of corpus linguistics is a corpus of text, which in a broad sense is understood as some written or spoken texts used for language research. Therefore, considerable attention is paid to creating and using language corpora. The authors emphasize that corpus technologies can be used to create electronic dictionaries, in particular in the aspect of lemmatization and stemming, because corpus technologies reduce the volume of the dictionary by using the primary form of the word, which reduces the number of entries and facilitates the search for the required word. It is also essential to pay attention to the corpus organization of linguistic data, which needs to consider the typology of text corpora, as this contributes to the strategy and principles of its creation. The research found that corpus technologies are effective in comparing words, as they contribute to establishing standard and distinctive features of different languages, which allows outlining the peculiarities of their use.

References

Бобкова Т. В. Корпус текстів : основні аспекти визначення. Науковий вісник кафедри ЮНЕСКО Київського національного лінгвістичного університету. Філологія, педагогіка, психологія. 2014. Вип. 29. С. 11–20. URL : http://www.mova. info/ corpus_papers/bobkova-corpus.pdf/.

Демська-Кульчицька О. Основи національного корпусу української мови : [монографія]. Київ : Інститут української мови НАНУ, 2005. 219 с.

Жуковська В. В. Вступ до корпусної лінгвістики : навчальний посібник. Житомир : Вид-во ЖДУ імені Івана Франка, 2013. 142 с.

Лендау С. І. Словники : мистецтво та ремесло лексикографії [пер. з англ.]. Київ : К. І. С., 2012. 480 с.

Aarts J., Meijs. W. Corpus Linguistics : Recent developments in the Use of Computer Corpora in English Language Research. Amsterdam : Rodopi, 1984. 425 p.

Biber D. Corpus-based and corpus-driven analyses of language variation and use. The Oxford Handbook of Linguistic Analysis / eds. B. Heine, H. Narrog. Oxford, 2010. P. 159–191.

Dash N. S. Corpus linguistics and language technology : with reference to Indian Languages. New Dehli : Mittal Publications, 2005. 445 p.

Kennedy G. Introduction to corpus linguistics. London : Longman, 1998. 315 р.

Leech G. Corpora and theories of linguistic performance. Directions in corpus linguistics / ed. J. Startvik. Berlin, 1992. P. 105–122.

Leech G., Fligelston S. Computers and corpus analysis. Computers and written texts / [ed. C. S. Butler]. Oxford : Blackwell Oxford, 1992. P. 115–140.

McEnery T., Hardie A. Corpus linguistics : method, theory and practice. Cambridge : Cambridge University Press, 2012. 294 p.

McEnery T., Xiao R., Tono Y. Corpus-Based Language Studies : An Advanced Resource Book. London, New York, 2006. 408 p.

Meyer Ch. F. English corpus linguistics. An introduction. Cambridge : Cambridge University Press, 2002. 168 p.

Tognini-Bonelli E. Corpus Linguistics at Work. Amsterdam : John Benjamins, 2001. 219 p.

Francis W. N. Language Corpora B. C. Directions in Corpus Linguistics / [ed J. Svartvik]. Berlin and New York : Moutin, 1992. P. 17–34.

Published

2023-08-30