top of page
Software Engineering for Data Scientists

Asynchronous

-

This program is included with Career Launch Cohort Enrollment and Erdős Institute Alumni Club Membership at no additional cost.
erdosOspin.gif

Checking your registration status...

To access the program content, you must first create an account and member profile and be logged in.

You are registered for this program.

Registration Deadlines

-

-

-

Category

Launch, Supplemental, Self-Directed, Mini-Course

Overview

The Software Engineering for Data Scientists course is meant to help data scientists write production ready code as well as gain familiarity with the tools used to make models available to their users. The core idea we will be exploring is making code robust and re-usable across a team. This course can also serve as an introduction toward ideas used in ML Ops and Data Engineering.

Slack

#slack-channel

Organizers, Instructors, and Advisors

matt_osborne.png

Steven Gubkin

Head of Training and Assessment

Office Hours:

By appointment only

Email:

Preferred Contact:

Slack

Your primary contact for GitHub access.

matt_osborne.png

Kevin Nowland

Lead instructor, ML Ops Engineer

Office Hours:

Intermittent Thursday Afternoons

Email:

Preferred Contact:

Slack

Please reach out on slack if you have any questions about the content in this course!

Objectives

After completing this course, you will be able to the following:
- Understand common tools used to deploy models for real-time inference
- Improve your code's robustness through unit testing
- Improve your code's readability through using linters and type checking
- Use basic command line commands
- Be able to implement a simple continuous integration pipeline using GitHub Actions

First Steps/Prerequisites

  • Figure out how to access a terminal emulator, e.g., the Terminal program on Mac OS / Ubuntu 
  • If using Windows, enable the Windows Subsystem for Linux and access a terminal emulator 
  • Download pyenv and use it to install python 3.10.x
First Steps

Program Content

I'm a paragraph. Click here to add your own text and edit me. It's easy.

25231-github-cat-in-a-circle-icon-vector-icon-vector-eps.png
Program Content

Textbook/Notes

Containerization

Productionization

Containerization is a modern answer for how to package up simple applications, such as a web serving making a model available. This video explores the basics of building images and running containers.

Slides
Transcript

Serialization

Productionization

In this short video we’ll talk about serialization and deserialization of python objects to save them to disk, to talk with other languages, and to send data over the internet.

Slides
Transcript

Functional Programming

Theory

In functional programming, functions are the first class objects. In addition, data is usually taken to be immutable. In this video we explore the consequences of these ideas.

Slides
Transcript

Trunk based development

Code quality

How do we incorporate changes to a repository when multiple team members are working on the repository at once?

Slides
Transcript

Packages

Code quality

We present one way to write a pip-installable python package, refactoring code written in the previous video.

Slides
Transcript

Text Editors

Getting ready to code

A brief introduction to the command line editor vim and compare with VS Code, a full IDE.

Slides
Transcript
Code

Two Web Frameworks

Productionization

We will create a simple web app using two popular python web frameworks: flask and FastAPI, putting to use the knowledge from the previous couple videos

Slides
Transcript

Parallelism

Theory

We talk about vectorization using numpy and pandas and introduce processes and threads while discussing the differences between I/O bounded and compute bounded processes.

Slides
Transcript

Object Oriented Programming

Theory

This video will focus on object oriented programming, we’ll talk about the Liskov substitution principle, and then talk about some drawbacks to using an object oriented programming style.

Slides
Transcript

Type hints

Code quality

In this video we’re going to talk about type hints, which is a way to help you and your teammates know what objects a function should be returning and what types of objects a function requires.

Slides
Transcript

Code style

Code quality

Taking to heart the truism that each line of code will be read more often than it is written, we explore python style as commonly used in the greater python community

Slides
Transcript

Intro to the CLI - part 2

Getting ready to code

We'll talk about configuring your shell, file ownership and permissions, how to talk to other computers, and other useful command line tools.

Slides
Transcript
Code

REST and HTTP

Productionization

In this video I’m going to give an introduction to the HTTP protocol and REST endpoints, starting us on the path toward exposing a model over the internet

Slides
Transcript

Intro do Data Structures & Algorithms

Theory

Akul Dewan gives a lecture introducing core concepts in data structures and algorithms

Slides
Transcript
Code

Continuous integration

Code quality

How do we enable short lived branches while maintaining code quality? Continuous integration tools can help

Slides
Transcript

Unit testing

Code quality

We talk about refactoring code to by DRY, the scope of functions, and unit testing some (but not all!) of them

Slides
Transcript

Dependency management

Getting ready to code

How to setup a python environment with an emphasis on communicating what the environment is across a team

Slides
Transcript

Intro to the CLI - part 1

Getting ready to code

We’ll be talking about the different shells that allow you to interact with your computer, navigating the filesystem, and basic ways to manipulate files.

Slides
Transcript
Code

Project/Homework Instructions

I'm a paragraph. Click here to add your own text and edit me. It's easy.

Project/Team Formation
Project Submission
Projects README

Schedule

Click on any date for more details

Please check your registration email for program schedule and zoom links.

Project/Homework Deadlines

bottom of page