Henan Oilfield Information Management Center, Nanyang, Henan 473132, China)
[Abstract] In linguistics, computer technology is more capable. In the past linguistic research, the corpus needs to be manually compiled, time-consuming and labor-intensive, and now all the data collection and processing summary is completed by the computer, people only need to write and maintain the program to achieve the purpose of building a large corpus. As a scoring system such as correction, it is more accurate and efficient. This system is applied to many large-scale English exams at home and abroad. The article analyzes the development of an automatic scoring system, automatic scoring system construction, linguistic principle technology, etc. The simulation evaluates the establishment of an automatic scoring system and its advantages and disadvantages and analyzes the gaps and shortcomings of the automatic scoring system and its development prospects.
[Chinese Library Classification Number] TP399 [Document Identification Code] A [Article ID] 1002-8129(2017)01-0069-03
In 1943, the world’s first computer was introduced in the United States. It weighed 5 tons and consisted of 75 million parts. It is very cumbersome and cannot perform overly complex calculations. With the rapid development of technology, computers are becoming smaller and more versatile. Today’s computer technology not only satisfies the needs of people for complex computing but also enables deeper and more complex algorithm calculations and moves toward artificial intelligence. Computer technology is also more used in the field of linguistic research. In linguistics, there are mainly the following applications:
First, The Corpus
The initial corpus was a database of various English language materials collected, and this collection work has been done by human resources for a long time. Now, using computer technology, the collection and collection of language materials has a new and more convenient way of processing. The Brown Corpus is considered to be the representative of the first generation of corpora, and it contains a small number of linguistic materials, about one million or so. In the 1980s, the second-generation corpus was born. The corpus of this generation has a much larger number of materials than the first generation, and a variety of different entries have been added. In the 1990s, the corpus evolved into a third-generation, more commercially-used corpus, often containing billions of words and materials, and is being refined with more advanced technology.
Second, Machine Translation
The research of machine translation has been going through for nearly 50 years, and many new theories, new methods, and new technologies have emerged. With the advent of commercial corpora, machine translation has made breakthroughs, developing statistical methods and example methods. Although there has been considerable progress in this field, there are still many problems in machine translation. This is not only a breakthrough in the computer field but also an effort in linguistics.
Third, Automatic Composition Scoring System
The AutoEssay Scoring System (AESS) is another important application in the field of linguistics in the computer field. Writing tasks are an important part of a wide-ranging language test and can be seen in almost all types of large-scale language tests at all levels. It can test the subject’s mastery of the language. For the scoring of the writing task, one needs a lot of manpower and material resources to support, and the other is that the individual differences will lead to the subjectiveness of the scoring results, and the reliability and validity are not high. The development and application of computer technology have greatly helped solve the above two problems.
The above is a brief review of the application of computer technology in linguistics. The following focuses on the application of computer technology in the automatic composition scoring system.
The page is a pioneer in the field of automatic composition scoring systems. He created the Project Essay Grader (PEG) system in 1966 to more easily and quickly resolve the scoring task in the essay section of a wide-ranging language exam. At the time, the evaluation system only scored the composition by analyzing the characteristics of the specific text, and the rating was relatively simple. Until 1990, the research bottleneck in this field was broken. With the development of natural language processing technology and information retrieval technology, research in the field of automatic composition scoring systems has revived. In the 1990s, the Educational Testing Service (ETS) began to develop the first generation of ETS. Although the content of the essay has not yet been included in the assessment system of the scoring system, and it can only judge sentences within 20 words, it has been able to score the essay by direct evaluation. In the late 1990s, three new automated essay scoring systems emerged: 1Intelligent EssayAssessor (IEA) placed more emphasis on essay content.
2 ElectronicEssay Rater (E-rater) is a new system based on the first generation of ETS, which comprehensively considers the structure of the article, sentence structure and article content. 3 Intelligent Metric (IM) is the first system that uses artificial intelligence technology to comprehensively consider and score the style and content of the composition.
After a general understanding of the development of the automated composition scoring system, we highlight the computer technology used in the automated composition scoring system.
Page divides the essay score into two parts, one is the scoring of the content; the other is the scoring of the linguistic features. The former pays more attention to the specific content described in the article while the latter includes syntax, writing mechanism, wording, and expression. The focus of the debate is that these two aspects should be considered together and should not be considered in isolation. The idea of considering the two has been accepted by most scholars.
The self-action text scoring system comprehensively uses statistical methods, natural language processing techniques, information retrieval techniques, and text clustering techniques. The most important statistical techniques include simple keyword analysis, special text feature analysis, latent semantic analysis, and text categorization techniques.
(1) Specific Text Feature Analysis Technology
This technique was originally used by Page in the PEG system in 1966. Page believes that the characteristics of the composition are represented by textual features, and these textual features can be measured. For example, a paragraph of text can be expressed by its sentence length, and the complexity of the sentence structure can be quantified by the number of words such as prepositions and relative pronouns. The author’s vocabulary level can be determined by detecting changes in the length of the vocabulary in the article. To implement the AES system, Page uses variable analysis, where variables are specific text features that can be directly quantified and calculated by the computer.
(2) Latent Semantic Analysis Technology
The central idea of latent semantic analysis technology is very simple. On the one hand, the meaning of paragraphs is largely determined by the vocabulary contained in it. Once a vocabulary is replaced, the meaning of the whole paragraph may change. On the other hand, the meaning of the two paragraphs has a lot to do with the different words contained between the two paragraphs. In short, it is: word meaning 1 + word meaning 2+ … + word meaning n = paragraph meaning.
Latent semantic analysis is a complex technique used for text indexing and information retrieval. It is robust and can help identify potential relationships between vocabulary in different texts. In the latent semantic analysis technology center, it makes a specific vector for each article, the column vector corresponds to the text property and the row vector corresponds to the text features such as words, sentences, segments, and so on. Words that don’t contribute much to the article’s score will be discarded to reduce the scope of the study and reduce the amount of calculation.
(3) Natural Language Processing Technology
This technique was first applied to the E-rater system, which uses this technique to analyze every sentence in an article. For example, the part-of-speech tagger gives each word a part of speech, then analyzes the sentence structure in a text parser, and analyzes the paragraph structure of the article in the analyzer. The scoring system using this technology includes five separate modules to complete the scoring. Three of them use recognition features as scoring criteria. They are syntactic modules, paragraph modules, and topic analysis modules for analyzing syntactic complexity, writing ideas, and vocabulary abilities. The fourth module is to choose the weights assigned to each feature, and the last module is used to calculate the final score.
(4) Text Classification Technology
This technique is mainly used to classify and extract the vocabulary, syntax and other elements appearing in the article, and to establish a corresponding corpus for providing a basic database for the scoring system to extract features and perform comparative analysis, combined with other methods. Make the final score.
With the development of computer technology, the automatic composition scoring system has gradually improved. With the improvement of technology and the extraction of elements from linguistic analysis through discourse analysis, the automatic composition scoring system has become more and more widely available for a wide range of language testing. A reliable score assessment. Despite this, there is still a big gap between the automatic composition scoring system and manual scoring. How to reduce the individual’s error and score more accurately and more specifically. This is the development direction of the next automatic essay scoring system. With the advancement of intelligent technology, the future automatic composition scoring system will be more complete, and it will enable human beings to safely deliver the task of writing corrections to the computer at an early date.
Subscribe to our newsletter