Spring 2021 GC Digital Initiatives

February 25, 2021 Marilyn Weber

The GC Digital Fellows will be offering workshops again this semester on topics such as data mining and predictive modeling, data and text analysis, mapping, digital archiving, and audio editing. Participation is free and open to anyone with a GC affiliation; however registration at least 24 hours in advance is requested. Please share the following list with students and faculty in your programs.

You can also learn more about how to make use of GC Digital Initiatives resources this semester by reading our“Welcome Back” blog post, which explains how to make the most of GCDI consultations, workshops, events, working groups, and resources.

~~~~~~~~~~~~~~~~~~~~~~

Working with HathiTrust Data, March 8 @ 1:00 pm – 2:30 pm

The HathiTrust Digital Library is the largest set of digitized books in the world. Its collection spans over 17 million items and is composed of the combined collections of several major research libraries in the United States. In addition to making a significant amount of its collection freely available online, HathiTrust also provides cloud-based research computing interfaces to sort and analyze its collections for scholarly purposes. As such, the HathiTrust collection presents an extremely rich source of data as well as a versatile set of tools for computational inquiry across the humanities and the social sciences. This workshop provides an overview of the interface for accessing and analyzing the HathiTrust’s data. Participants will learn some basic digital text analysis concepts and have an opportunity to start playing with the HathiTrust data.

This workshop will run in a hybrid format. Participants will be emailed resources to read and short assignments to try out on March 1. We will then meet via zoom on March 8th at 1:00pm to discuss your experiences with using the tools and address any questions or concerns that you may have. Zoom details will be provided to participants who register.

R (Text Mining or Predictive Modeling), March 19 @ 3:00 pm – 4:30 pm

This workshop will introduce a workflow of predictive modeling using R. We will introduce machine learning at a concept and coding level without touching the math, and will cover both supervised learning and unsupervised learning at a basic level. After taking this workshop, you will be able to apply the machine learning framework to any dataset for predictive tasks. Basic R programming skill is required (You can test yourself by reading the DRI 2021 Intro to R Workshop Materials).

Choosing the Right Platform for Your Digital Archive, March 17 @ 11:30 am – 1:00 pm

This workshop is designed for students, faculty, and staff at all levels interested in creating a digital archive as part of their research projects or for use in the classroom. A brief discussion on some of the theoretical underpinnings of digital archival work will be followed by an overview of a range of tools and platforms (including WordPress, Omeka, Tropy, Drupal, Jekyll, and Collective Access). Together, we will go over a number of questions and examples that will help you choose a platform that best fits your needs and those of your projects.

Intro to Text Analysis in R, March 22 @ 12:00 pm – 1:30 pm

Working with text is an important component in many academic and industrial contexts. Whether your aim is to understand public reactions to the latest political event on Twitter, determine the appropriate target demographic with your product, or analyze DNA sequences, manipulating and restructuring text is a fundamental task. Fortunately, recent developments in the R programming environment have made many of these tasks more intuitive. The tidytext package, a new addition to the tidyverse collection of packages, provides an intuitive interface for manipulating text data. Using the tidytext package, you will learn how to take unstructured song lyrics, wrangle them into a “tidy” format for analysis, and perform a basic sentiment analysis with the data. The focus of this workshop will be on how to manipulate text data into a format ready for analysis, regardless of the software you wish to perform the analysis in (although R has some strong capabilities in this area too). This is an intermediate-level workshop, where we expect participants to be familiar with R and the tidyverse, including pipes (%>%) and the dplyr functions select(), filter(), mutate(), group_by(), and summarize().

Introduction to Data Analysis with Python, March 25 @ 1:00 pm – 2:30 pm

In this hands-on workshop, we will learn the basics of data exploration, analysis and visualization with Python. We will introduce and work with the Pandas library and Jupyter Notebook, tools that have become the standard for data analysis with Python. Some introductory Python knowledge is necessary, since we will not have time to go over the basics during the workshop. But do not let that scare you. If you take our online Introduction to Python workshop ( https://github.com/DHRI-Curriculum/python ) before coming to this one, you will be fine. This workshop will be held online with Zoom

Basic audio editing with Audacity, March 31 @ 10:00 am – 11:30 am

This hands-on workshop will be a foundational exploration of Audacity as an audio editing software. We will discuss some of its functionalities that can help you edit and clean your audio files to prepare it for publishing and sharing. Prior knowledge and experience is not necessary for the workshop, though having a pre-recorded sound clip/file (~1 min) that you can practice on is encouraged.

Intro to Making an Interactive Map, April 7 @ 1:00 pm – 3:00 pm

Interactive maps have increasingly become an effective tool for communicating one’s research and engaging a broad audience. This workshop will introduce you to the basics of making an interactive map that can be shared and embedded in a website. No mapping experience is necessary. This hands-on workshop will take place live over Zoom.

Web Scraping with Python (bs4), April 19 @ 12:00 pm – 1:00 pm

This workshop explores a Python library that allows users to work with and analyze web-based data. The Beautiful Soup (bs4) Python library enables users to pull data out of web pages made of HTML and XML. They can then use bs4 to search, categorize, and structure HTML and XML documents. Bs4 is relatively lightweight and easy to get up and running, and we will practice scraping popular websites like the New York Times. Though bs4 offers a beginner-friendly approach to web scraping, some basic familiarity with Python and HTML is highly encouraged. NOTE: This workshop will also be included as part of the Carnegie EdTech Fellow Workshop Series.

This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

The Ph.D. Program in History

Spring 2021 GC Digital Initiatives

Need help with the Commons?