Skip to main content
Dataset
Creating a corpus of all possible word-forms of modified Russian sound verbs using web-scraping methodology. Compilation and adjustment of summary tables for the future tense, imperative, and gerund forms. Forms of “dual action”. Experimental multi-dimensional scaling of web-scraping results.
Research Data
  • Irina V. Ivliyeva, Missouri University of Science and Technology
  • Perry Koob, Missouri S&T Information Technology
Alternative Title

Создание корпуса всех возможных словоформ модифицироватнных русских глаголов звучания методом веб-извлечения. Составление и корректировка сводных таблиц для форм будущего времени, повелительного наклонения, и деепричастий. Формы совместного действия. Экспериментальное многомерное шкалирование результатов веб-извлечения.

Ивлиева, И.В.
Kуб, Перри
______________________________________________

Abstract

Modern Russian dictionaries do not include all possible forms of words. To compile such an index, even for one part of a speech, on paper is not technically feasible. With the appearance of electronic versions of dictionaries, however, for the first time we can try to create an inventory of all possible forms for the lexical-semantic group of Russian sound verbs, using the web-scraping methodology. This project attempts to develop a number of comprehensive tables for the prefixed (semantically modified at the word-formation level) sound verbs and all their forms. A novel, four-position system of numbering the verbal forms to support experimental multi-dimensional scaling of results have been introduced. The research output takes into the account not only all documented (recorded) modifications of the source verbs, but also all potential derivative forms. The compellation of the tables, the labelling of the elements and the subsequent analysis of their correlations revealed some technical execution and documenting challenges that are directly related to the semantics of verbal modifications (e.g. the concurrence of the future tense forms, the imperatives of verbs of sound and forms of “dual action”). The project outcomes may positively impact the development of diverse web-based applications for gathering, visualizing, or analyzing data from a variety of digital lexicographic sources across a single or multiple language, corpora, or from other digital text collections.

Start Date
01 Jun 2021
End Date
30 Apr 2022
Contact Information

Dr. Irina V. Ivliyeva, ivliyeva@mst.edu
Professor of Russian, Arts, Languages, and Philosophy Department
Missouri University of Science and Technology

Perry B. Koob, koobp@mst.edu
Database Administrator/System Administrator
Academic Technology Support Team
Missouri S&T Information Technology

Department(s)
Arts, Languages, and Philosophy
Comments

Ивлиева, И.В., Kуб, Перри. Создание корпуса всех возможных словоформ модифицироватнных русских глаголов звучания методом веб-извлечения. Составление и корректировка сводных таблиц для форм будущего времени, повелительного наклонения, и деепричастий. Формы совместного действия. Экспериментальное многомерное шкалирование результатов веб-извлечения. Июнь 2021 – апрель 2022. Рабочие тезисы. Март 2022.

Document Type
Data
Document Version
Final Version
File Format
text
Language(s)
Russian
Language 2
English
Publication Date
4-7-2022
Publication Date
Apr 2022
Disciplines
Citation Information
Ivliyeva, Irina V., Koob, Perry. Creating a corpus of all possible word-forms of modified Russian sound verbs using web-scraping methodology. Compilation and adjustment of summary tables for the future tense, imperative, and gerund forms. Forms of “dual action”. Experimental multi-dimensional scaling of web-scraping results. June 2021 – April 2022. Working theses. March 2022.