The term knowledge-based MT has come to describe a rule-based system displaying extensive semantic and pragmatic knowledge of a domain , including an ability to reason , to some limited extent, about concepts in the domain (the components, installation and operation of a particular brand of laser printer could constitute a domain). We noted the appeal of such an approach as a way of solving some basic MT problems in earlier chapters. Essentially, the premise is that high quality translation requires in-depth understanding of the text, and the development of the domain model would seem to be necessary to that sort of deep understanding. One of the important considerations driving this work is an appreciation that post-editing is time-consuming and very expensive, and therefore that efforts made to produce high quality output will pay off in the long run. Since this may well turn out to be of great utility, in this section we concentrate on an approach which attempts some degree of text understanding on the basis of detailed domain knowledge, developed at the Center for Machine Translation at Carnegie Mellon University in Pittsburgh.
To give some idea of what is at stake here, the prototype systems
developed for English
Japanese translation during
the late 1980s at CMU, dealing with the translation of instruction
manuals for personal computers, contained the following components:
Table: Example Frame for the concept computer
Knowledge-based MT is still pursued today at CMU in the KANT system, but is much more modest in terms of its goals for domain knowledge , which is limited to that which is necessary for stylistically adequate, accurate translation, as opposed to deep textual understanding. Thus the domain model simply represents all the concepts relevant in the domain, but does not support any further reasoning or inference about the concepts in the domain, other than that which is directly encoded (e.g. hierarchical information such as the fact that personal computers and mainframes are types of computer). The essential role of the domain model is to support full disambig uation of the text. An important part of this is specifying, for every event concept in the domain, what restrictions it places on the object concepts which constitute its arguments (e.g. only living things can die, only humans can think, in a literal sense) or the `fillers' of `slots' in its (frame-based) representation.
Once you start adding detailed knowledge in the pursuit of high
quality translation through text understanding, it is tempting to add
more and more sources of knowledge. It is quite clear that anaphora
resolution and the resolution of other referential ambiguities
requires reference to a level of structure above sentential syntax and
semantics (see e.g. the examples in Chapter
).
Likewise, for stylistic reasons, to increase the cohesiveness of the
text, one might need to keep some working representation of the
paragraph structure. Achieving a really high quality translation,
especially with some sorts of text, might require treatment of
metaphor , metonymy , indirect speech act s, speaker/hearer attitudes and
so on. Over the last few years a variety of groups in different parts
of the world have begun experimenting with prototypes intended to work
with explicit knowledge or rule components dealing with a wide variety
of different types of information. All of these approaches can be
viewed as examples, of one form or another, of knowledge-based MT.