This Chapter is about the role played by dictionaries in MT. Our decision to devote a whole chapter to this discussion reflects the importance of dictionaries in MT:
We shall approach the question of dictionaries in MT obliquely, by considering in some detail the information contained in, and issues raised by, the paper dictionaries with which we are all familiar. There are a number of reasons for this, but the most important is that the dictionaries in existing MT systems are diverse in terms of formats, coverage, level of detail and precise formalism for lexical description. This diversity should not be a surprise. Different theories of linguistic representation can give rise to different views of the dictionary, and different implementation strategies can make even fundamentally similar views of the dictionary look very different in detail. Moreover, the different kinds of MT engine obviously put quite different requirements on the contents of the dictionary. For example, dictionaries in an interlingual system need not contain any translation information per se, all that is necessary is to associate words with the appropriate (collections of) interlingual concepts. By contrast, transformer systems will typically give information about source language items, and their translations, including perhaps information that is really about the target language, and which is necessary to trigger certain transformations (e.g. to do with the placement of particles like up in look it up, and look up the answer). Since transfer systems typically use more abstract levels of representation, the associated dictionaries have to contain information about these levels. Moreover, in a transfer system, especially one which is intended to deal with several languages, it is common to separate monolingual dictionaries for source and target languages (which give information about the various levels of representation involved in analysis and synthesis ), from bilingual dictionaries which are involved in transfer (which normally relate source and target lexical items, and which normally contain information only about the levels of representation that are involved in transfer).
We would like to abstract away from these divergences and points of
detail in order to focus on the main issues. Accordingly, we will
begin with a brief discussion of typical entries that one might find
in a good monolingual `paper' dictionary, and a good bilingual `paper'
dictionary.
We will then briefly discuss the sort of information
about words that one typically finds in MT dictionaries, outlining
some of the different ways such information can be represented. As we
have said, a simple view is that a dictionary is a list of words.
However, it is impractical, and perhaps impossible to provide an
exhaustive list of words for most languages. This is because of the
possibility of forming new words out of existing ones, by various
morphological processes. In Section
we will look briefly
at these, and provide some discussion of how they can be dealt
with, and the problems they raise in an MT context. In
Section
we will briefly describe the difference
between terminology and general vocabulary.