Data Science Boot Camp

Fall 2024

Sep 5, 2024

Dec 13, 2024

This program is included with Fall 2024 Career Launch Cohort Enrollment and Erdős Institute Alumni Club Membership at no additional cost.

Checking your registration status...

To access the program content, you must first create an account and member profile and be logged in.

You are registered for this program.

Registration Deadlines

Sep 6, 2024

All Erdős Fall 2024 Career Launch Cohort or Alumni Club members who are not participating in the UX Research nor Deep Learning Boot Camps

TEAM

Aware NLP Project III

Mohammad Nooranidoost, Baian Liu, Craig Franze, Mustafa Anıl Tokmak, Himanshu Raj, Peter Williams

This project involves the investigation and evaluation of different methodologies for retrieval for use in RAG (Retrieval-Augmented Generation) systems. In particular, this project investigates retrieval quality for information downloaded from employee subreddits. We investigated the impacts of using clustering, multi-vector indexing, and multi-querying in advanced retrieval methodologies against baseline naive retrieval.

First Steps/Prerequisites

Computer Setup Day/First Steps

There are some computer set up steps you need to complete before the first lecture. We will meet on 09/05/2024 on Zoom to make sure that we have all done the following:

Cloned the GitHub repo locally

Installed the conda environment.

Run a Jupyter Notebook using that conda environment.

Detailed instructions (created by teaching assistant Ness Mayker Chen) can be found at this link.

We will test your ability to do these things by having you submit a "secret code". You will obtain this code by successfully running the notebook

computer_setup_day/find_secret_code.ipynb

When you have obtained the code put it in the textbox at https://www.erdosinstitute.org/ds-boot-camp-prep

If you can do these things independently please show up to help your colleagues!

If you cannot do these things independently please show up to get help from your colleagues!

Prerequisites

In addition to these computer setup steps there are also some content prerequisites:

Base level familiarity with Python

Differential calculus. Ideally you also know some multivariate differential calculus and linear algebra.

Basic statistics and probability

First Steps

Program Content

I'm a paragraph. Click here to add your own text and edit me. It's easy.

https://github.com/TheErdosInstitute/data-science-fall-2024

Program Content

Textbook/Notes

Alec's Lost Introduction

Live Lectures

Due to a technical glitch, the audio for the video Alec made to introduce himself and relevant information about projects didn't work in the orientation lecture. You can watch it now!

Slides

Transcript

Code

Gradient Boosting

11 Week: Ensemble Learning II (prerecorded)

A second boosting algorithm that is loosely associated with gradients.

Slides

THE ERDŐS INSTITUTE

Helping PhDs get and create jobs they love at every stage of their career.

Data Science Boot Camp

TEAM

Aware NLP Project III

Mohammad Nooranidoost, Baian Liu, Craig Franze, Mustafa Anıl Tokmak, Himanshu Raj, Peter Williams

Textbook/Notes

Alec's Lost Introduction

Live Lectures

Gradient Boosting

11 Week: Ensemble Learning II (prerecorded)

Week 12: Neural Networks

12 Week: Neural Networks (prerecorded)

keras

12 Week: Neural Networks (prerecorded)

Introduction to Recurrent Neural Networks I

12 Week: Neural Networks (prerecorded)

tSNE

Bonus content (prerecorded)

Hierarchical Clustering

Bonus content (prerecorded)

Regression Version of Classification Algorithms

Bonus content (prerecorded)

How To Form Projects

Presentation Tips and Tricks (prerecorded)

How to clone the GitHub Repo

Technical Support

Data Source Websites

Data Collection (prerecorded)

Data in Databases

Data Collection (prerecorded)

Data Splits

02 Week: Regression I (prerecorded)

Multiple Linear Regression

02 Week: Regression I (prerecorded)

Scaling Data

03 Week: Regression II (prerecorded)

Bias-Variance Trade-Off

04 Week: Regression III (prerecorded)

PCA and Basketball

Bonus content (prerecorded)

Adjustments for Time Series Data

07 Week: Time Series I (prerecorded)

Rolling Averages

07 Week: Time Series I (prerecorded)

Autoregressive (AR(p)) Models

07 Week: Time Series II (prerecorded)

SARIMA

07 Week: Time Series II (prerecorded)

The Confusion Matrix

08 Week: Classification I (prerecorded)

Bayes' Based Classifiers I

09 Week: Classification II (prerecorded)

Support Vector Machines I

09 Week: Classification II (prerecorded)

What is Ensemble Learning?

10 Week: Ensemble Learning I (prerecorded)

Boosting

11 Week: Ensemble Learning II (prerecorded)

Math Hour 1

Math Hour

XGBoost

11 Week: Ensemble Learning II (prerecorded)

Perceptrons

12 Week: Neural Networks (prerecorded)

Introduction to Convolutional Neural Networks

12 Week: Neural Networks (prerecorded)

Loading Pre-Trained Models

12 Week: Neural Networks (prerecorded)

What is Clustering?

Bonus content (prerecorded)

Imputation

Bonus content (prerecorded)

Gradient Descent

Bonus content (prerecorded)

General Presentation Tips

Presentation Tips and Tricks (prerecorded)

Organizing your work

Technical Support

Web Scraping with BeautifulSoup