The Lancaster-Oslo/Bergen Corpus (LOB) was compiled by researchers in Lancaster, Oslo and Bergen. It consists of one million words of British English texts from 1961. The texts for the corpus were sampled from 15 different text categories. Each text is just over 2,000 words long (longer texts have been cut at the first sentence boundary after 2,000 words) and the number of texts in each category varies (see table below). Further information about the texts can be found in the LOB manual (external link).
This corpus is the British counterpart of the Brown Corpus of American English, which contains texts printed in the same year so that comparison between both varieties could be made.The corpus has been grammatically tagged (all words have been given a word-class label). The tagged and untagged versions of the corpus are available through ICAME. (For an application form and information about the cost, click here.)
More comprehensive information about the LOB corpus can be found in the LOB manual (external link).