Next: Representing Linguistic Knowledge
Up: Representation and Processing
Previous: Representation and Processing
In this chapter we will introduce some of the
techniques that can be used to represent the kind of information that
is needed for translation in such a way that it can be processed
automatically. This will provide some necessary background for
Chapter
, where we describe how MT systems actually work.
Human Translators actually deploy at least five distinct kinds of
knowledge:
- Knowledge of the source language.
- Knowledge of the target language. This allows them to
produce texts that are acceptable in the target language.
- Knowledge of various correspondences between source language and
target language (at the simplest level, this is knowledge of how individual
words can be translated).
- Knowledge of the subject matter, including ordinary
general knowledge and `common sense' . This, along with
knowledge of the source language, allows them to
understand what the text to be translated means.
- Knowledge of the culture, social conventions, customs,
and expectations, etc. of the speakers of the source and
target languages.
This last kind of knowledge is what allows translators to
act as genuine mediators, ensuring that the target text genuinely
communicates the same sort of message, and has the same sort of impact
on the reader, as the source text.
Since no one
has the remotest idea how to represent or manipulate this sort of
knowledge, we will not pursue it here --- except to note that it is
the lack of this sort of knowledge that makes us think that the proper
role of MT is the production of draft or `literal' translations.
Knowledge of the target language is important because without it,
what a human or automatic translator produces will be ungrammatical,
or otherwise unacceptable. Knowledge of the source language is
important because the first task of the human translator is to figure
out what the words of the source text mean (without knowing what they
mean it is not generally possible to find their equivalent in the
target language).
It is usual to distinguish several kinds of linguistic knowledge:
- Phonological knowledge : knowledge about the sound system of a
language, knowledge which, for example, allows one to work out the
likely pronunciation of novel words. When dealing with written texts,
such knowledge is not particularly useful. However, there is related
knowledge about orthography which can be useful. Knowledge about
spelling is an obvious example.
- Morphological knowledge : knowledge about how words can be
constructed: that printer is made up of print +
er.
- Syntactic knowledge : knowledge about how sentences, and other
sorts of phrases can be made up out of words.
- Semantic knowledge : knowledge about what words and phrases mean,
about how the meaning of a phrase is related to the meaning of its
component words.
Some of this knowledge is knowledge about individual words, and is
represented in dictionaries . For example, the fact that the word
print is spelled the way it is, that it is not made up of other
words, that it is a verb, that it has a meaning related to that of the
verb write, and so on. This, along with issues relating to the
nature and use of morphological knowledge, will be discussed in
Chapter
.
However, some of the knowledge is about whole classes or
categories of word. In this chapter, we will focus on this sort of
knowledge about syntax and semantics . Sections
, and
discuss syntax, issues relating to semantics are
considered in Section
. We will look first on how
syntactic knowledge of the source and target languages can be
expressed so that a machine can use it. In the second part of the
chapter, we will look at how this knowledge can be used in automatic
processing of human language.
Next: Representing Linguistic Knowledge
Up: Representation and Processing
Previous: Representation and Processing
Arnold D J
Thu Dec 21 10:52:49 GMT 1995