World Wide Web Access to Corpora
Corpus Linguistics

[Please note: these pages are no longer maintained]


INTRODUCTION

GLOSSARY

CORPORA

COURSES

BIBLIOGRAPHY

RELATED SITES

SOFTWARE

SEARCH ENGINE

TUTORIAL

COMMENTS




These pages were created as part of the
W3-Corpora Project
at the
University of Essex. 1996-98
They are no longer maintained.
 


About the W3-Corpora project

The World Wide Web Access to Corpora Project (W3-Corpora) was run at the Department of Language and Linguistics at the University of Essex. The two year project was funded by the Joint Information Systems Committee (JISC) as part of the W3C-IGE project.

Background

The widespread availability of computing infrastructure creates the possibility for linguists and others to consult large collections of texts on-line. Such linguistic corpora represent a valuable but under exploited resource for teaching and research. Uptake has been restricted because of the needs to master a relatively complicated set of techniques. Many linguists do not yet exploit corpus resources in their research or teaching. Moreover, the same is true (but to a much greater extent) of students. It is not generally the case that students and researchers have tried to use corpus resources and found the result unhelpful. Rather, they have not in general tried to use them at all. The reason for this is not inherent conservatism, but the difficulties which face the would-be user of corpus resources, who must make a significant, and otherwise unnecessary, investment in hardware and media, and invest a considerable amount of time and effort learning about corpus searching tools and techniques (which they will typically not otherwise find useful).

Description of the project

The idea of this project was to enable and promote the use of corpus resources by allowing simple and straight forward access, via the WWW, to linguistic corpora. The user only needs access to the WWW to be able to perform corpus searches using a web browsing interface (such as Netscape, Internet Explorer, etc.)

Aim

The project aim was to provide free access to existing linguistic corpora via the World Wide Web (WWW) to students and researchers in Linguistics and related disciplines. This involved
  • design and provision of appropriate programs for corpus search
  • design and implementation of a WWW interface to those programs
Furthermore the project aimed at providing its users with online documentation to facilitate the use of the resources.

Progress

The project resulted in a WWW site with
  • a search engine by which it is possible to access a number of corpora and search for any phrase/expression
  • on-line help screens with information about how to use the search engine
  • introductory information about corpora and corpus linguistics with links and pointers to sources of further information
  • on-line tutorial illustrating how the on-site program can be used in linguistic study and research

Staff

The following people were involved in the project:

  • Doug Arnold (director)
  • Ylva Berglund (research officer 1997-98)
  • Natalia Brines-Moya (research officer 1996-97)
  • Martin Rondell (research officer, programmer)

Final Report

The Final Report of the project is available for downloading.

Corpus Linguistics Tutorial Search Engine

W3-Corpora project. Contact.