Arnold D J
- These estimates of CEC translation costs are from
- In fact, one can get perfect translations from
one kind of system, but at the cost of radically restricting what an
author can say, so one should perhaps think of such systems as
(multilingual) text creation aids, rather than MT systems.
The basic idea is similar to that of a phrase book, which provides the
user with a collection of `canned' phrases to use. This is fine,
provided the canned text contains what the user wants to say.
Fortunately, there are some situations where this is the case.
course, the sorts of errors one finds in draft translations produced
by a human translator will be rather different from those that one
finds in translations produced by machine.
- Of course, some
languages have larger vocabularies than others, but this is mainly a
matter of how many things the language is used to talk about (not
surprisingly, the vocabulary which Shakespeare's contemporaries had
for discussing high-energy physics was rather impoverished), but all
languages have ways of forming new words, and this has nothing
to do with logical perfection.
- Weaver described an analogy of
individuals in tall
closed towers who communicate (badly) by shouting to each other.
However, the towers have a common foundation and basement. Here
communication is easy: ``Thus it may be true that the way
to translate ... is not to attempt the direct route, shouting from
tower to tower. Perhaps the way is to descend, from each language,
down to the common base of human communication --- the real but as yet
undiscovered universal language.''
- Hatim and
[Hatim and Mason1990] give a number of very good examples where
translation requires this sort of cultural mediation.
- For some reason, linguists' trees are
always written upside down, with the `root' at the top, and the leaves
(the actual words) at the bottom.
- In English, SUBJECTs can
only be omitted in imperative sentences, for example orders, such as
Clean the printer regularly, and in some embedded sentences,
e.g. the boxed part of It is essential 122#122
- We have not specified the
time-reference information: see Chapter .
- Another possibility would be to
have another rule which put the translated preposition immediately
after the verb object, giving Turn the button back a position.
- The names of
these particular Semantic Relations should not be taken too seriously.
In fact, of course, it does not much matter what the relations
are called, so long as they are the same in the source and target
- `Paper' here is intended to convey `intended for
human readers', as opposed to `electronic' meaning `intended for use
by computers'. Of course, it is possible for a paper dictionary to be
stored on a computer like any other document, and our use of `paper'
here is not supposed to exclude this. If one were being precise, one
should distinguish `paper' dictionaries, `machine readable'
dictionaries (conventional dictionaries which are stored on, and can
therefore be accessed automatically by computer), and `machine usable
- The form of the monolingual entry is
based on that used in the Oxford Advanced Learner's Dictionary
(OALD); the bilingual entry is similar to what one finds in
Collins-Robert English-French dictionary.
- One can also get
some idea of the cost of dictionary construction from this. Even
if one were able to write four entries an hour, and keep this up for 8
hours a day every working day, it would still take over three years to
construct even a small size dictionary. Of course, the time it takes
to write a dictionary entry is very variable, depending on how much of
the work has already been done by other lexicographers.
fact, it is arguable that the vocabulary of a language like English,
with relatively productive morphological processes, is infinite, in
the sense that there is no longest word of the language. Even the
supposedly longest word antidisestablishmentarianism can be
made longer by adding a prefix such as crypto-, or a suffix such
as -ist. The result may not be pretty, but it is arguably a
possible word of English. The point is even clearer when one considers
compound words (see Section .
- The restriction applying on the OBJECT of
the verb actually concerns the thing which is buttoned whether
that appears as the OBJECT of a active sentence or the SUBJECT of a
- In this rule we write
+finite for finite=+. We also ignore some issues about
datatypes, in particular, the fact that on the right-hand-side V
stands for a string of characters, while on the lefthand (lexical)
side it stands for the value of an attribute, which is probably an
atom, rather than a string.
- More precisely, the rule is that
the third person singular form is the base form plus s, except
(i) when the base form ends in s, ch, sh, o,
x, z, in which case +es is added (for example,
poach- poaches, push- pushes), and (ii) when the base
form ends in y, when ies is added to the base minus
- Notice, however, that we
still cannot expect morphological analysis and lexical lookup to come
up with a single right answer straight away. Apart from anything else,
a form like affects could be a noun rather than a verb. For
another thing, just looking at the word form in isolation will not
tell us which of several readings of a word is involved.
- Note that the category of the stem
word is important, since there is another prefix un
which combines with verbs to give verbs which mean `perform the
reverse action to X' --- to unbutton is to
reverse the effect of buttoning.
- Where words have
been fused together to form a compound , as is prototypically the case
in German , an additional problem presents itself in the analysis of
the compound , namely to decide exactly which words the compound
consists of. The German word Wachtraum, for example, could
have been formed by joining Wach and Traum giving a
composite meaning of day-dream. On the other hand, it could
have been formed by joining Wacht to Raum, in which
case the compound would mean guard-room.
in the sense of `genuine invention which is not
governed by rules', rather than the sense of `creating new things by
following rules' --- computers have no problem with creating new things
by following rules, of course.
- This discussion of the Japanese passive is
a slight simplification. The construction does sometimes occur without
the adversive sense, but this is usually regarded as a
`europeanism', showing the influence of European languages.
- This is a simplification,
of course. For one thing, it could be used to refer to
something outside the discourse, to some entity which is not
mentioned, but pointed at, for example. For another thing, there are
some other potential antencedents, such as the back in (411#411
and it could be that Speaker A is returning to the digression in
f). Though the discourse structure can
helps to resolve pronoun-antecedent relations, discovering the
discourse structure poses serious problems.
- Politeness dictates that giving by the hearer to
the speaker is normally giving `downwards' ( kureru), so this is
the verb used to describe requests, and giving
by the speaker to the hearer is normally giving `upwards' (
ageru), so this is the verb used to describe offers, etc.
- As noted above, knowledge about
selectional restrictions is unusual in being defeasible in just this
way: the restriction that the AGENT of eat is ANIMATE is only
a preference, or default, and can be overridden. This leads some
to think that it is not strictly speaking linguistic knowledge at all.
In general, the distinction between linguistic and real world
knowledge is not always very clear.
stands for American Standard Code for Information Interchange.
example, suppose one has a printer manual marked up in this way, with
special markup used for the names of printer components wherever they
occur. It would be very easy to extract a list of printer parts
automatically, together with surrounding text. This text might be a
useful addition to a parts database. As regards consistency, it would
be easy to check that each section conforms to a required pattern ---
e.g. that it contains a list of all parts mentioned in the section.
- Although most elements of the structure are
exactly matched, there may sometimes be differences. For example, if
the document element Paragraph is composed of document element
Sentence(s), it is perhaps unwise to insist that each Sentence in each
language is paired exactly with a single corresponding Sentence in
every other language, since frequently there is a tendency to
distribute information across sentences slightly differently in
different languages. However, at least for technical purposes, it
is usually perfectly safe to assume that the languages are paired
Paragraph by Paragraph, even though these units may contain slightly
different numbers of sentences for each language.
stands for `Perkins Approved Clear English'.
- For an excellent discussion of the range
of aspects that a good translation may need to take into account, see
Hatim and Mason [Hatim and Mason1990].
- This comes from the section on `Talking to the Tailor' in an
English-Italian phrasebook of the 1920s.
- `Declarative' here is to be
contrasted with `procedural'. A declarative specification of a program
states what the program should do, without considering the order in
which it must be done. A procedural specification would specify both
what is to be done, and when. Properties like Accuracy and
Intelligibility are properties of a system which are independent of the
dynamics of the system, or the way the system operates at all ---
hence `non-procedural', or `declarative'.
- It would be
nice to try to find possible problem areas by some sort of automatic
scanning of bilingual texts but the tools and techniques are not
available to date.
- Here `same
value' is to be interpreted strongly, as token identity --- in a
sentence with two nouns, there would be two objects with the `same'
category value, namely, the two nouns. This is often called `type'
identity. In everyday usage, when we speak of two people having the
`same' shirt, we normally mean type identity. Token identity would
involve them sharing one piece of clothing. On the other hand, when we
speak of people having the same father, we mean token identity.
may use the measure of Mutual Information, taking into account
(roughly) the amount of mutual context elements share
Thu Dec 21 10:52:49 GMT 1995