Fundamentals of Data Science
(MA7419 / MA3419)
Getting Set Up
Welcome to Fundamentals of Data Science
(MA7419 / MA3419)
Congratulations. By the end of this module you will be confident in handling data in a collaborative and reproducible way. In other words in a professional way.
This is a vital skill for almost any of the jobs you’re likely to move on to, actuarial or not. The module is for anyone who wants to be a professional (in all the good senses of that word) data scientist.
There’s not going to be a lot of listening to lectures. Instead you will be developing the hands-on R programming skills to solve real data problems. And to do that, you’ll need the right software - so that’s the first step. We’ll be using R (R Core Team 2022) and RStudio (Posit team 2023), two amazing - and free - programs.
Installing R and RStudio
If you have a suitable laptop, you’re probably going to want to install R and RStudio on your own machine. This is good experience and lets you play around absolutely as you want to. Instructions for that are given below and I also strongly suggest you set up the project and folder structure described in the study material for Week 1.
You can also run RStudio on University PCs.
For most of the work in this module we will be working on a University of Leicester server which you can access from any device that can run a browser. So if you use a Chromebook you will be able to work just as well as people with machines running Windows or Mac OS. You’ll also be able to do a certain amount on a decent sized tablet and you’ll have access via a phone - though I think it would be hard to do any significant work that way.
Once your account has been set up (at the beginning of the semester) you should be able to access the server here:
https://rserver.mcs.le.ac.uk/rstudio/
You can then log in using your usual University user name (e.g. pk255) and your University password.
There is a final option. You could investigate Posit Cloud. Creating a free account there should give you everything you need, again through a browser. The disadvantage is that you only get 15 hours (last time I looked) per month for free. However, it can be useful as a last resort if the other options fail (especially if you’ve kept good backups of your work so you can upload them and carry on seamlessly). If you decide to make an account don’t use your university password on this, or any other cloud accounts.
We’ll make sure everyone is up and running in the first week’s sessions, so don’t panic if things don’t work immediately.
Please bring your laptop to lectures, that way you’ll be able to follow along with what I’m doing at the front of the room - however, we won’t be able to provide individual support in lectures, that’s what the computer lab sessions are for.
Installing R on your own machine.
The internet is overflowing with great, free R resources. We’re going to be using a number of them as the main reading for the module, and for instructions on installing software we might as well dive straight into an on-line book we’ll use a lot: R for Data Science (Wickham and Grolemund 2017).
Read Chapter 1, Introduction (confusingly, Chapter 2 is also called Introduction!). And follow the instructions you’ll find there to install R, RStudio, and the Tidyverse family of packages.
You might also want to start reading R Programming for Data Science (Peng 2020). There are instructions for installing the software in Chapter 3 (please ignore Section 3.2).
DataCamp
DataCamp is a website containing a very large number of programming courses - including many on R.
Subscriptions are usually about 30 USD per month but we have arranged for you to have free access for the duration of this module.
Using the link you can find in the Blackboard site you can register for an account and use the resources there to improve your R skills. (You will need your university email address for this one, but do not use your University password.)
We’ve found that the DataCamp system can take a while to recognise you’ve signed in with a free account. You might well find that after completing a few free chapters you are invited to sign up for a paid subscription. DO NOT DO THIS. Instead you should just log out and go back in 24 hours, when you should find you can access everything for free.
Getting help
There are a lot of students on this module and just one lecturer and a few TAs. We want you to ask lots of questions, but if you email them all to us we will probably be overwhelmed so please post your questions on the Blackboard forum.
Code of Conduct
- Help each other.
- Be nice.
- Acknowledge help you get from others. (If someone has been particularly helpful - or nice - let us all know on the forum or email me.)
- Don’t plagiarise, don’t collaborate if you have been asked not to, and don’t perform any other form of academic misconduct. You will almost certainly cut and paste code written by other people - acknowledge its source. The one exception is that you don’t have to acknowledge the source of every bit of code you are given in this course - that could get tedious.
- Respect other people’s right to an undisturbed lecture.
- Help each other.
Question for you: Is this a good list? What should be added or taken away?