Skip to Main Content

Text mining & text analysis

This guide contains resources for researchers about text mining and text analysis (sometimes known as distant reading).

Web scraping as a source of text data

Web crawling

A process of collating a collection of webpages by starting with an initial list of URLs (or links) and systematically processing each page to extract content and additional links. Writing a Web crawler requires basic programming knowledge.

Web scraping

Used to extract text from webpages. Web scraping software is designed to recognise different types of content within a website and to acquire and store only the types of content specified by the user, e.g. article titles or authors from a news website, or prices and product descriptions from a commercial website. Commercial software or programming languages can be used.

Web scraping and web crawling resources