Evolutionary Parsing Algorithm
Evolutionary parsing is based on the use of evolutionary algorithms, which imitate nature in terms of natural selection (survival of the fittest). In this context, individuals are generated by using genetic operators such as the cut operator. This operator produces a new individual by randomly cutting off a subtree from another one. 에볼루션파워볼 커뮤니티
There are different parameters that influence the performance of the algorithm, such as the figure of merit (FOM) used to evaluate the individual parses, and the size of the population. Increasing the population size reduces the number of times a parse is generated, but this also affects the quality of the parses produced.
In order to improve the performance of the algorithm, we have experimented with different figures of merit and different population sizes. We have also studied the effect of the elitism operator, which speeds up the convergence of the algorithm. Elitism is based on the principle of natural selection or survival of the fittest. It is a process in which individuals that are little adapted to the current environment are eliminated, and those that are best adapted are favored. In this way, the algorithm produces better performing parses. This is important because it allows us to obtain more appropriate grammars than those provided by best-first parsing algorithms. 에볼루션 파싱알
The training data is a set of part-of-speech tagged sentences taken from the Penn Treebank. Each of these sentences is represented by a parse tree that indicates the word-word dependencies in it. The individual parses are stored in a population that is fed by an evolutionary algorithm. The evolutionary algorithm mimics nature, following the principle of natural selection and survival of the fittest. This is done by producing new individuals based on the current population, but also by modifying previous ones using a cut operator.
The optimum fitness function used to evaluate the individual parses is defined based on the probability of the grammar rules involved in the parse. A conservative crossover operator is also employed to reduce the search space and accelerate the convergence of the algorithm. This is possible by limiting the number of individuals that can be produced, since enlarging the population beyond a certain size makes it impossible to produce a complete parse within a limited number of generations.
The algorithm has been tested on a number of sentences extracted from radiology reports. It converges in most cases within a few generations and produces correct parses for 80% of the sentences. This performance is comparable to that of a classical best-first chart parser. It is also able to produce novel parses that are not obtainable using exhaustive search. This is possible due to the cut operator that randomly selects a branch of the parse tree and eliminates it from the population. Individuals that produce offspring with a better chance of survival are favored. Those that do not perform well are eliminated and the process starts over again.
We also studied the different figures of merit that can be used for parsing and found that a specific FOM that considers contextual information outperformed the others. This FOM is a good candidate to be used in future evolutionary parsers. In addition, the results obtained with this technique indicate that evolutionary algorithms are a valid approach to parsing.
In this work, we have proved that evolutionary algorithms can be used to produce a bottom-up parser for natural language. This is an important result since it shows that the methodology developed for these algorithms can improve the results of classic parsers based on exhaustive search techniques. A comparison of different figures of merit demonstrates that this is indeed the case, with the evolutionary parser outperforming the other parsers.
The algorithm is based on the idea that links between linguistic elements are represented as genes, and the individual with the highest fitness represents the solution to syntax parsing. The evolution process continues until a certain number of generations is reached or until the convergence criterion is met. To accelerate this process, the cut operator is introduced, which produces an individual by cutting off a subtree of the parse tree at random. This allows individuals to recover incomplete constituents that were already parsed by previous generations.