next up previous contents index
Next: Why MT Matters Up: Introduction and Overview Previous: Introduction and Overview

Introduction

The topic of the book is the art or science of Automatic Translation, or Machine Translation (MT) as it is generally known --- the attempt to automate all, or part of the process of translating from one human language to another. The aim of the book is to introduce this topic to the general reader --- anyone interested in human language, translation, or computers. The idea is to give the reader a clear basic understanding of the state of the art, both in terms of what is currently possible, and how it is achieved, and of what developments are on the horizon. This should be especially interesting to anyone who is associated with what are sometimes called ``the language industries''; particularly translators, those training to be translators, and those who commission or use translations extensively. But the topics the book deals with are of general and lasting interest, as we hope the book will demonstrate, and no specialist knowledge is presupposed --- no background in Computer Science, Artificial Intelligence (AI), Linguistics, or Translation Studies.

Though the purpose of this book is introductory, it is not just introductory. For one thing, we will, in Chapter gif, bring the reader up to date with the most recent developments. For another, as well as giving an accurate picture of the state of the art, both practically and theoretically, we have taken a position on some of what seem to us to be the key issues in MT today --- the fact is that we have some axes to grind.

From the earliest days, MT has been bedevilled by grandiose claims and exaggerated expectations. MT researchers and developers should stop over-selling. The general public should stop over-expecting. One of the main aims of this book is that the reader comes to appreciate where we are today in terms of actual achievement, reasonable expectation, and unreasonable hype. This is not the kind of thing that one can sum up in a catchy headline (``No Prospect for MT'' or ``MT Removes the Language Barrier''), but it is something one can absorb, and which one can thereafter use to distill the essence of truth that will lie behind reports of products and research.

With all this in mind, we begin (after some introductory remarks in this chapter) with a description of what it might be like to work with a hypothetical state of the art MT system. This should allow the reader to get an overall picture of what is involved, and a realistic notion of what is actually possible. The context we have chosen for this description is that of a large organization where relatively sophisticated tools are used in the preparation of documents, and where translation is integrated into document preparation. This is partly because we think this context shows MT at its most useful. In any case, the reader unfamiliar with this situation should have no trouble understanding what is involved.

The aim of the following chapters is to `lift the lid' on the core component of an MT system to give an idea of what goes on inside --- or rather, since there are several different basic designs for MT system --- to give an idea of what the main approaches are, and to point out their strengths and weaknesses.

Unfortunately, even a basic understanding of what goes on inside an MT system requires a grasp of some relatively simple ideas and terminology, mainly from Linguistics and Computational Linguistics, and this has to be given `up front'. This is the purpose of Chapter gif. In this chapter, we describe some fundamental ideas about how the most basic sort of knowledge that is required for translation can be represented in, and used by, a computer.

In Chapter gif we look at how the main kinds of MT system actually translate, by describing the operation of the `Translation Engine'. We begin by describing the simplest design, which we call the transformer architecture . Though now somewhat old hat as regards the research community, this is still the design used in most commercial MT systems. In the second part of the chapter, we describe approaches which involve more extensive and sophisticated kinds of linguistic knowledge. We call these Linguistic Knowledge (LK) systems . They include the two approaches that have dominated MT research over most of the past twenty years. The first is the so-called interlingual approach , where translation proceeds in two stages, by analyzing input sentences into some abstract and ideally language independent meaning representation, from which translations in several different languages can potentially be produced. The second is the so-called transfer approach , where translation proceeds in three stages, analyzing input sentences into a representation which still retains characteristics of the original, source language text. This is then input to a special component (called a transfer component) which produces a representation which has characteristics of the target (output) language, and from which a target sentence can be produced.

The still somewhat schematic picture that this provides will be amplified in the two following chapters. In Chapter gif, we focus on what is probably the single most important component in an MT system, the dictionary, and describe the sorts of issue that arise in designing, constructing, or modifying the sort of dictionary one is likely to find in an MT system.

Chapter gif will go into more detail about some of the problems that arise in designing and building MT systems, and, where possible, describe how they are, or could be solved. This chapter will give an idea of why MT is `hard', of the limitations of current technology. It also begins to introduce some of the open questions for MT research that are the topic of the final chapter.

Such questions are also introduced in Chapter gif. Here we return to questions of representation and processing, which we began to look at in Chapter gif, but whereas we focused previously on morphological, syntactic, and relatively superficial semantic issues, in this chapter we turn to more abstract, `deeper' representations --- representations of various kinds of representation of meaning.

One of the features of the scenario we imagine in Chapter gif is that texts are mainly created, stored, and manipulated electronically (for example, by word processors). In Chapter gif we look in more detail at what this involves (or ideally would involve), and how it can be exploited to yield further benefits from MT. In particular, we will describe how standardization of electronic document  formats and the general notion of standardized markup (which separates the content of a document from details of its realization, so that a writer, for example, specifies that a word is to be emphasised, but need not specify which typeface must be used for this) can be exploited when one is dealing with documents and their translations. This will go beyond what some readers will immediately need to know. However, we consider its inclusion important since the integration of MT into the document processing environment is an important step towards the successful use of MT. In this chapter we will also look at the benefits and practicalities of using controlled languages --- specially simplified versions of, for example, English, and sublanguages --- specialized languages of sub-domains. Although   these notions are not central to a proper understanding of the principles of MT, they are widely thought to be critical for the successful application of MT in practice.

Continuing the orientation towards matters of more practical than theoretical importance, Chapter gif  addresses the issue of the evaluation of MT systems --- of how to tell if an MT system is `good'. We will go into some detail about this, partly because it is such an obvious and important question to ask, and partly because there is no other accessible discussion of the standard methods for evaluating MT systems that an interested reader can refer to.

By this time, the reader should have a reasonably good idea of what the `state of the art' of MT is. The aim of the final chapter (Chapter gif) is to try to give the reader an idea of what the future holds by describing where MT research is going and what are currently thought to be the most promising lines of research.

Throughout the book, the reader may encounter terms and concepts with which she is unfamiliar. If necessary the reader can refer to the Glossary at the back of the book, where such terms are defined.



next up previous contents index
Next: Why MT Matters Up: Introduction and Overview Previous: Introduction and Overview



Arnold D J
Thu Dec 21 10:52:49 GMT 1995