next up previous contents index
Next: Feasibility of General Up: Rule-Based MT Previous: Flexible or Multi-level

Knowledge-Based MT

 

The term knowledge-based MT  has come to describe a rule-based system displaying extensive semantic  and pragmatic  knowledge of a domain , including an ability to reason , to some limited extent, about concepts in the domain (the components, installation and operation of a particular brand of laser printer could constitute a domain). We noted the appeal of such an approach as a way of solving some basic MT problems in earlier chapters. Essentially, the premise is that high quality  translation requires in-depth understanding of the text, and the development of the domain model  would seem to be necessary to that sort of deep understanding. One of the important considerations driving this work is an appreciation that post-editing is time-consuming and very expensive, and therefore that efforts made to produce high quality output will pay off in the long run. Since this may well turn out to be of great utility, in this section we concentrate on an approach which attempts some degree of text understanding on the basis of detailed domain  knowledge, developed at the Center for Machine Translation at Carnegie  Mellon University in Pittsburgh.

To give some idea of what is at stake here, the prototype systems developed for English Japanese  translation during the late 1980s at CMU, dealing with the translation of instruction manuals for personal computers, contained the following components:

  For a small vocabulary (around 900 words), some 1500 concepts were defined in detail. The ontology dealt solely with the interaction between personal computers and their users. Nouns in the interlingua correspond to `object concepts' in the ontology, which also contains `event concepts', such as the event remove, corresponding to the English verb remove and the Japanese  verb torinozoku (by no means are all mappings from the interlingua into natural language as straightforward as this, for example, the concept to-press-button must be divided into subevents corresponding to pressing, holding down and releasing the button). Concepts are represented in a form of frame  representation language, familiar from work in Artificial Intelligence and Natural Language Processing, in which frames (providing an intrinsic characterisation of concepts) are linked in a hierarchical network. To give an idea of the amount of detailed knowledge about concepts that one might want to encode, Table gif gives by way of example a frame for the concept computer.

  
Table: Example Frame for the concept computer

 

Knowledge-based MT is still pursued today at CMU in the KANT  system, but is much more modest in terms of its goals for domain knowledge , which is limited to that which is necessary for stylistically  adequate, accurate translation, as opposed to deep textual understanding. Thus the domain model simply represents all the concepts relevant in the domain, but does not support any further reasoning  or inference about the concepts in the domain, other than that which is directly encoded (e.g. hierarchical information such as the fact that personal computers and mainframes are types of computer). The essential role of the domain model is to support full disambig uation of the text. An important part of this is specifying, for every event concept in the domain, what restrictions it places on the object concepts which constitute its arguments (e.g. only living things can die, only humans can think, in a literal sense) or the `fillers' of `slots' in its (frame-based) representation. 

Once you start adding detailed knowledge in the pursuit of high quality  translation through text understanding, it is tempting to add more and more sources of knowledge. It is quite clear that anaphora  resolution and the resolution of other referential ambiguities requires reference to a level of structure above sentential syntax  and semantics  (see e.g. the examples in Chapter gif). Likewise, for stylistic  reasons, to increase the cohesiveness of the text, one might need to keep some working representation of the paragraph structure. Achieving a really high quality translation, especially with some sorts of text, might require treatment of metaphor , metonymy , indirect speech act s, speaker/hearer attitudes and so on. Over the last few years a variety of groups in different parts of the world have begun experimenting with prototypes intended to work with explicit knowledge or rule components dealing with a wide variety of different types of information. All of these approaches can be viewed as examples, of one form or another, of knowledge-based MT.  


next up previous contents index
Next: Feasibility of General Up: Rule-Based MT Previous: Flexible or Multi-level



Arnold D J
Thu Dec 21 10:52:49 GMT 1995