Skip to Main Content

Text mining & text analysis

This guide contains resources for researchers about text mining and text analysis (sometimes known as distant reading).

Sources of text data

If you are considering text mining and analysis as a research method then you must have text data of some kind to analyse. Your text data may be:

  • created as a part of your research, e.g. survey responses, interview transcripts
  • collated as part of your research, e.g. journal articles for literature review, writings of an author
  • collated by a third party, e.g. Senate enquiry transcripts, British National Corpus
  • collated via web scraping, e.g. news feeds, social media posts and comments, website content

Sources of text data include:

Data citation

Acknowledging your data source

Data integrity and acknowledging the source of your data in important. Just as researchers routinely provide a bibliographic reference to sources such as journal articles, reports and conference papers, Data Citation is the practice of providing reference to datasets.

Further information about data citation can be found on our Manage research data page.

Data collections & datasets

Data collections & datasets

The Digital Humanities Toychest brings together a collection of data resources including demo corpora, document/image collections, linguistic corpora, map collections and datasets.