Context

The Applied Social Data Science Programme is a postgraduate diploma (PG Dip.) offered by the Department of Political Science and funded by the HEA’s Springboard+ initiative. The target student group for Springboard+ programmes is unemployed people with a previous history of employment, those in employment, and those returning to the workforce. In practice, approximately half of students are recent graduates. The programme is currently close to completing its second year, and will next year be offered also as a paid MSc (i.e. with additional 30 ECTS dissertation). As well as Springboard+ students, ASDS modules are also offered to Political Science PhD researchers, and the Statistics I and II modules are compulsory methods classes for first year PhD researchers.

Need

Following year one of the programme it became clear from student feedback that teaching did not always meet expectations, particularly regarding the retraining and upskilling aspect of Springboard+ programmes. Students felt that the programme was too narrowly focussed on purely academic skills appropriate for PhD researchers and those wishing to proceed to a career in HE, and did not include enough focus on the broader skillset required for working in the private or public sectors as a data scientist.

A need was therefore identified to improve professionalisation, and introduce students to the wider set of techniques necessary to be competitive within the job market. These include managing the data science workflow, familiarising students with common data science platforms for collaboration, as well as the more practical coding skills needed to wrangle data and communicate results.

Proposed Solution

To meet this need I redesigned my teaching for Statistics I, a compulsory 10 ECTS module offered to both ASDS students and first year PhD researchers. Teaching on this 10 week module is currently divided between 2hrs per week of lectures, given by a colleague, and 2hrs per week of tutorials/labs, taught by me. In year one of the programme these tutorials were divided into two 1 hour sessions, for two groups of students.

My proposed solution involved combining the two groups into a single 2 hour class, and then using the additional time available to restructure the pedagogical approach of the tutorials to a more praxis-oriented method. Rather than treating professional skill acquisition as separate to the learning outcomes, and simply pointing students toward additional external resources, I attempted to model each tutorial around a (realistic) data science project workflow, including the systems and tools which would be necessary to complete the task.

Within this approach an emphasis was placed on iteration: the same processes were repeated each week, with progressive complications and technical challenges added to stretch students whilst reinforcing a mental map of data science as a set of practices.

To begin with, processes were introduced simply as workflow (i.e., without an explicit requirement to code or engage with overly technical resources); once the motivation for an approach was clear, the technical aspect of practice was then gradually introduced, until students were able to grasp both why and how certain processes were followed.

Implementation

Implementation of the new teaching strategy involved advancing pedagogy in three specific areas: firstly, new teaching material was developed in skills for workflow management and professionalisation; secondly, synchronous teaching sessions were redesigned to include greater focus on collaboration, group work and peer learning; and finally, additional opportunities for formative assessment and continuous feedback were integrated into the teaching design. Each of these areas involved embedding digital pedagogy in some form.

Workflow Management and Professionalisation

To embed good workflow practices the first five weeks of tutorials were designed as discrete mini projects. I researched online for resources by data scientists that described their own workflow in a pragmatic way, and eventually decided on the approach used by Pat Schloss (Riffomonas), a data scientist working in genetics who uses a combination of R Studio with github for version control and collaborative work.

In the existing teaching approach to this module students were already required to sign up to github. Github is an online repository for sharing code which makes use of git, a version control system. Git and github are both widely used in the industry, and skill acquisition in both is thus useful for professionalisation. However, git/github were previously used only as a method for distributing and grading assignments, with no explicit instruction provided, and students thus struggled both with motivation (why) and implementation (how).

I decided therefore to explicitly integrate git and github within the weekly workflow of tutorials. Within the Statistics I repository, I placed each week a tutorial folder which was further divided into sub-folders for code, data and results. When students forked the main repository they gained access to copies of these folders on their own system, which they could then work with in R Studio. In this way, I could model students both the process needed to upload their own assignments, as well as good practice in managing future data science projects (version control, separation of code from data from results, etc.)