Skip to Main Content

Text mining & text analysis

This guide contains resources for researchers about text mining and text analysis (sometimes known as distant reading).

Text mining & Text analysis - what is the difference?

Text mining began with the computational and information management fields (e.g. database searching and information retrieval), whereas Text analysis began in the humanities with the manual analysis of text, (e.g Bible concordances and newspaper indexes). More recently, the two terms have become synonymous, and now generally refer to the use of computational methods to search, retrieve, and analyse text data.

"Text mining or text analytics is an umbrella term describing a range of techniques that seek to extract useful information from document collections through the identification and exploration of interesting patterns in the unstructured textual data of various types of documents – such as books, web pages, emails, reports or product descriptions." (Truyens & van Eecke, 2014)

Manual vs computational text analysis

Researchers have been analysing texts for centuries and manual text analysis techniques are still valid, and often preferred, for analysing text collections of a manageable size (say less than 100,000 words). However, with the accessibility of powerful computers, software, and programming languages, many text analysis techniques have been automated for use in analysis of collections of text data that are too large to be read, interpreted and coded manually by humans.

How does text mining work? (YouTube, 1m:35s). This video is an introduction to text mining and how it can be used in research.

Library Resources that provide an introduction to text mining and analysis