The Lancaster Parsed Corpus is a parsed subcorpus of the LOB Corpus, compiled by Roger Garside, Geoffrey Leech and Tamas Varadi. It is also availabe through ICAME.

It is a treebank consisting of over 133.000 words from sentences of the LOB Corpus. Each sentence is annotated with a phrase-structure parse in the form of labelled bracketing. The labels mark the boundaries of sentence, clause, phrase and coordinated word constituents. The word tags used in the tagged version of the LOB Corpus are also part of the annotation of the Lancaster Parsed Corpus.

ICAME will provide you with more information about the corpus and the tags used. You can also find out about the Licence Agrrement and Cost of the Lancaster Parsed Corpus.


Sample of the Lancaster Parsed Corpus:

A07 418
[S[Na I_PP1A Na] [V can_MD n't_XNOT make_VB V][N a_AT club_MM N][Tb[V
pay_VB V] [N a_AT player_NN N][N[D so_QL much_AP D][N a_AT week_NN N]N]
Tb]._. S] B04 248
[S[N \OMr_NPT Henry_NP Newton_NP [Po of_INO [N Acton_NP N] Po]N][V does_DOZ
not_XNOT want_VB V][N his_PP$ daughter_NN N] [Ti [Vi to_TO marry_VB Vi][N
a_AT Scotsman_NNP N)Ti] ._. S]