A school-based, collaborative support infrastructure for digital and computational humanities established and maintained by the School of Languages and Cultures at the University of Queensland. The LADAL assists with data processing, visualization, and analysis and offers guidance on matters relating to language technology and digital research tools.
Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr.
The second edition of this book will show you how to use the latest state-of-the-art frameworks in Natural Language Processing, coupled with Machine Learning and Deep Learning to solve real-world case studies leveraging the power of Python.
Specifically designed for linguists, this book provides an introduction to programming using Python for those with little to no experience of coding. More experienced users of Python will also benefit from the advanced chapters on graphical user interfaces and functional programming.
Provides a practical introduction to computational text analysis using the open source programming language R. Each chapter builds on its predecessor as readers move from small scale "microanalysis" of single texts to large scale "macroanalysis" of text corpora, and each concludes with a set of practice exercises that reinforce and expand upon the chapter lessons. Text Analysis with R is written with students and scholars of literature in mind but will be applicable to other humanists and social scientists wishing to extend their methodological toolkit to include quantitative and computational approaches to the study of text.
NLTK is a powerful Python package that provides a set of diverse natural languages algorithms. It is free, opensource, easy to use, large community, and well documented. NLTK consists of the most common algorithms such as tokenizing, part-of-speech tagging, stemming, sentiment analysis, topic segmentation, and named entity recognition.