Skip to Main Content

Text mining & text analysis

This guide contains resources for researchers about text mining and text analysis (sometimes known as distant reading).

Machine learning

Text analysis often relies on machine learning, a branch of computer science that trains computers to recognise patterns. There are two kinds of machine learning used in text analysis: supervised learning, where a human helps to train the pattern-detecting model, and unsupervised learning, where the computer finds patterns in text with little human intervention. An example of supervised learning is Naive Bayes Classification. See Natural Language Processing and Topic Modeling for examples of unsupervised machine learning

Natural language processing

Natural language processing, a kind of machine learning, is the attempt to use computational methods to extract meaning from free text. Among other things, natural language processing algorithms can derive: names of people and places, dates, sentiment, and parts of speech.

Topic modelling

Topic modeling, a form of machine learning, is a way of identifying patterns and themes in a body of text.  Topic modeling is done by statistical algorithms, such as Latent Dirichlet Allocation, which groups words into "topics" based on which words frequently co-occur in a text

Network analysis

Network analysis is a method for finding connections between nodes representing people, concepts, sources, and more. These networks are usually visualised into graphs that show the interconnectedness of the nodes. 

Social network analysis - the process of investigating social structures through the use of networks and graph theory. It characterises networked structures in terms of nodes (individual actors, people, or things within the network) and the edges, or links (relationships or interactions) that connect them. 

Semantic network analysis - a network that represents semantic relations (meanings) between concepts. This is often used as a form of knowledge representation. It is a graph consisting of nodes, which represent concepts, and edges, which represent semantic relations between concepts.

Visualisations

Text visualisation is a way to "see" your data.  Text mining visualisation can help researchers see relationships between certain concepts.  An example of a visualisation of data can be word clouds, graphs, maps, and other graphics that produce a visual depiction the data.

Various Text Analysis Projects with Visualisations

Word Frequency Visualisations