A process of collating a collection of webpages by starting with an initial list of URLs (or links) and systematically processing each page to extract content and additional links. Writing a Web crawler requires basic programming knowledge.
Used to extract text from webpages. Web scraping software is designed to recognise different types of content within a website and to acquire and store only the types of content specified by the user, e.g. article titles or authors from a news website, or prices and product descriptions from a commercial website. Commercial software or programming languages can be used.
Web scraping and web crawling resources