Probase

Limitations of syntactic approaches:

  1. Natural Language is ambiguous and syntactic patterns will be able to extract only limited amount of relationships. Additionally, syntactic patterns are succeptible to false alarms, when the presence of a pattern was detected incorrectly. for example “... animals other than dogs such as cats ...”.
  2. Syntactic patterns can discover only limited set of relationships.
  3. The applciation of patterns is complex. In an example “... industrialized countries such as US and Germany ...” most syntactic patterns will dicover US isA country and not US isA industrialized country.

The solution is to augument syntactic parsing with semantic understanding. An example is domestic animals other than dogs such as cats. The possible interpretations of this (according to syntactic parsers) are cat isA dog and cat isA domestic animal. At the beginnign it is hard to decide which is correct. However, if we wait for one itration, the answer will be clear. The system will know that domestic animal isA animal, and cat isA animal . Thus, it will be possible to complete the hierarchy.

Another problem that is hard to solve is when the name contains the word and (and other similar cases). In such situation it is hard for system to decide whether it sees two instances, or one complex named instance. It can be cheched with PMI. But using one PMI value means assuming only one meaning of a phrase. The situation can be improved by considering information about context.

Authors propose an iterative procedure, where on each iteration we acquire only the knowledge we are confident in, and use this to further extract knowledge that was missed. The most basic version only considers occurence counts of a concept pair.

As it was mentioned before, syntactic patterns are weak in identifying concept instances correctly. Especially when they are described by phrases instead of one word. For this situation the problem is solved by enumerating all possible combinations. Wrong options will luckily be filtered later. Will they be?

NELL


© 2022. Vitaly Romanov

Powered by Hydejack v8.1.1