|
|
Annotated Corpora
Apart from the pure text, a corpus can also be provided with additional
linguistic information, called 'annotation'. This information can be of
different nature, such as prosodic, semantic or historical annotation. The most
common form of annotated corpora is the grammatically tagged one. In a
grammatically tagged
corpus, the words have been assigned a word class label (part-of-speech tag).
The Brown Corpus, the
LOB Corpus and the
British National Corpus (BNC)
are examples of grammatically annotated corpora. The
LLC Corpus has been prosodically
annotated. The
Susanne Corpus
is an example of a parsed corpus, a corpus
that has been syntactically analysed and annotated.
Annotated corpora constitute a very useful tool for research. In the Tutorial you
can find examples of how to make use of the annotation when searching a corpus.
Further information about corpus annotation and annotated corpora can be found,
for example, in the book Corpus Annotation: Linguistic Information from Computer Text Corpora (external link),
or by using the following links:
|