See these examples of researchers using text mining and analysis. The first example was research undertaken at the University of Queensland.
A Bird's-eye view of the past: Digital history, distant reading and sport history - This research investigates the utility of distant reading as a research tool via three newspaper case studies concerning Muhammad Ali, women’s surfing in Australia, and homophobic language and Australian sport. Distant reading is defined as an umbrella term that embraces many practices, including data mining, aggregation, text analysis, and the visual representations of these practices.
Rescued history: Massive text data analysis helps uncover black women's experiences - Researchers used high performance computers to analyze 20,000 documents from the HathiTrust and JSTOR databases that were known to contain information about black women. This analysis was used to create a computational model based on this corpus of documents which they then used to study the entire 800,000 documents in both databases. To make sense of the huge datasets, the investigators used computational techniques of topic modeling and data visualization.
Six degrees of Francis Bacon - Text mining the Oxford Dictionary of National Biography for relationships between early modern persons, documents, and institutions to create a digital reconstruction of the early modern social network of England. Researchers used Named-Entity Recognition (NER) to process the unstructured text into structured data – specifically a matrix of documents and named entities – that was amenable to statistical analysis. Researchers also applied statistical graph-learning methods to the structured data and topic modeling.
Text mining genotype-phenotype relationships from biomedical literature for database curation and precision medicine - Researchers developed a highly accurate machine-learning-based text mining approach for mining complete genotype-phenotype relationships from biomedical literature. Disease-gene-variant triplets were extracted from all abstracts in PubMed related to a set of ten important diseases. Mutations associated with the queried disease were identified using a machine-learning(ML)-based classification algorithm trained to detect disease-related mutations.
Three real-world applications of text mining to solve specific business problems - Text mining is being applied to answer business questions and to optimize day-to-day operational efficiencies as well as improve long-term strategic decisions. This article describes practical real-world instances where text mining has been successfully applied in three industries.e.g. text mining (keyword and thematic analysis) warranty repair comments by technicians to identify component defect insights leading to informed interventions for preventing them in future.
Using the Google N-Gram corpus to measure cultural complexity - Using the Google Books American 2Gram corpus, this study shows that (as predicted from the cumulative nature of culture), US culture has been steadily increasing in complexity, even when (for economic reasons) the amount of actual discourse as measured by publication volume decreases.
TED Talk: What we learned from 5 million books (YouTube, 14m:08s). This video looks at the surprising things learnt from Google Labs' NGram Viewer.