Module Details

CE807-7-SP-CO: Text Analytics

Year: 2017/18
Department: Computer Science and Electronic Engineering
Essex credit: 15
ECTS credit: 7.5
Available to Study Abroad / Exchange Students: No
Full Year Module Available to Study Abroad / Exchange Students for a Single Term: No
Outside Option: No

Staff
Supervisor: Professor Massimo Poesio
Teaching Staff: Professor Massimo Poesio
Contact details: School Office, e-mail csee-schooloffice (non-Essex users should add @essex.ac.uk to create full e-mail address), Telephone 01206 872770.

Module is taught during the following terms
Autumn Spring Summer

Module Description

The aim of this module is to provide students with an understanding of text analytics and its applications. Students will be introduced to state of the art methods for extracting structured information (e.g. opinions about products) from unstructured textual data, in particular in social media; and to techniques for summarizing and analyzing this information.

Learning Outcomes:

After completing this module, students will be expected to be able to:

1. Use text classification techniques for a variety of applications
2. Develop systems for identifying the entities mentioned in text, the relations between them, and the opinions expressed about these entities
3. Analyze data extracted from social media such as blogs and tweets
4. Develop systems for summarizing textual information.

Outline Syllabus:

1. Text classification: techniques and applications

2. Sentiment analysis

3. Extracting information from text: entities, relations

4. Summarizing textual information

5. Analyzing social media.

Learning and Teaching Methods

Mode of delivery:

2 hours of lectures per week, 2 hours of laboratory time per week.

Assessment

100 per cent Coursework Mark

Coursework

Assignment 1. Text categorization and sentiment analysis, 50% of module. This assignment involves developing a text categorization system--e.g., for sentiment analysis of Twitter data. The assessment is going to be based in part on the code, in part on the report. In the new coursework-only version of the course, in the report students will also be asked to answer theoretical questions. Assignment 1 is to be handed out in week 18 and submitted to FASer in week 20. Assignment 2. Information extraction, 50% of module. This assignment involves the development of a system for, e.g., named entity resolution, or disambiguation to Wikipedia of query logs. The assessment is based in part on the code produced, in part on report, which, in the new version of the module, will also require the students to answer some theoretical questions. Assignment 2 will be handed out in week 22, to be submitted to FASer in week 24.

Bibliography

  • Recommended
  • Richert and Coelho - Building Machine Learning Systems with Python (2nd ed) - Pack Press (RC)
  • Manning, Raghavan & Schutze - Introduction to Information Retrieval - Cambridge, 2008 (MRS)
  • Morris - Text Processing in Java - Colloquial Media
  • Other useful references
  • Jurafsky&Martin - Speech and Language Processing, 2nd ed. - Prentice-Hall