WEBVTT 00:00:00.000 --> 00:00:03.000 And I think we should be recording now. Okay, great Hi, welcome, everybody. 00:00:03.000 --> 00:00:06.000 Welcome to the first day of the 2023 May boot camp so today we're gonna do an introduction to everything. 00:00:06.000 --> 00:00:25.000 Let me go ahead and get my slides ready, and then share this slides, and we'll get going. 00:00:25.000 --> 00:00:29.000 Okay, so this is our intro to the 2023 made data science boot camp. 00:00:29.000 --> 00:00:42.000 I'm at Osborne I'll introduce myself a little bit, and then a couple of slides, so I just want to say, welcome and sort of give you a rundown of what the goals of the boot camp are as we go through the presentation you know if 00:00:42.000 --> 00:00:53.000 there's a pressing question feel free to ask, and if you have, you know, a concern or something, also feel free you can either post it in the chat, or you can just chime in and ask it. 00:00:53.000 --> 00:00:58.000 If you do the raise hand thing, I have a hard time, seeing that because I'm sharing my screen. 00:00:58.000 --> 00:01:05.000 But you know you can just say it, and then I'll do my best to answer, and then we'll have funny of time for questions and answers and other sorts of concerns. 00:01:05.000 --> 00:01:08.000 You might have at the end of the slides. So what are we gonna try and do? We're gonna learn some python hopefully in the boot camp. 00:01:08.000 --> 00:01:12.000 We'll learn some data science hopefully in the boot camp. 00:01:12.000 --> 00:01:22.000 And then by the end, you'll complete a data science project and Alex will talk more about that in a little bit. 00:01:22.000 --> 00:01:28.000 So I want to start by going over like your top, 2 resources beyond that aren't just people. 00:01:28.000 --> 00:01:35.000 So the 2 top resources for a lot of the questions you might have, or trying to figure out when things happen or where to go. 00:01:35.000 --> 00:01:43.000 The first is the boot camp website that Romans done a great job of, you know, setting up, you probably came here today to, you know, no one stuff was happening. 00:01:43.000 --> 00:01:46.000 So here's what the website looks like. If you haven't seen it. 00:01:46.000 --> 00:01:50.000 A big thing to look at is, it? Has both a syllabus and a Pdf. 00:01:50.000 --> 00:01:56.000 Of a schedule. It has, what's the goal of the boot camp? 00:01:56.000 --> 00:02:02.000 Examples of projects from previous boot camps. First steps which I'll be referring to a lot in this lecture. 00:02:02.000 --> 00:02:08.000 Not video program content. And so program content includes a link to our Github which I'll talk about later as well as videos, which I'll also talk about later. 00:02:08.000 --> 00:02:20.000 And you know everything you could want, and at the bottom there's a schedule where, if you're on this, either you did it through your profile, or you came here. 00:02:20.000 --> 00:02:26.000 Has a schedule for things like my office hours from last week I'll be adding some more later this week. 00:02:26.000 --> 00:02:34.000 Our lecture times and our problem session times, as well as like deadlines for the projects which Alc. 00:02:34.000 --> 00:02:40.000 Will talk about so getting back to our slides. 00:02:40.000 --> 00:02:47.000 So the other thing you're gonna want to go to and make sure you understand how to access is the Irish Institute slack and particular. 00:02:47.000 --> 00:02:54.000 Look for the spring. 2023 cohort Channel. So this is going to be where we post all of our announcements about this spring boot camp. 00:02:54.000 --> 00:03:09.000 So, for instance, like if something needs to be changed, like, maybe there's a typo in the notes or something like that, or if you know, some new event, pops up that we didn't originally plan for maybe somebody wants to come talk about something from a company, we've had that in the past check for 00:03:09.000 --> 00:03:12.000 it there as well as in your errors. Profile. 00:03:12.000 --> 00:03:18.000 So some of the people that you'll be involved with working with the most on like sort of the administrative side as myself. 00:03:18.000 --> 00:03:31.000 I'm the lecturer. I've been the head of boot camps, as Roman said before, we start recording I've been doing the head of boot camp stuff since 2020, and then been involved with the boot camp since the beginning. I got my Phd. 00:03:31.000 --> 00:03:47.000 From Ohio, State, mathematics in 2020, and since then I've been doing this as well as working on research with my former advisor, the second person that you're gonna probably work with the most you know, in terms of organizing things is app clot and he'll talk in a 00:03:47.000 --> 00:03:51.000 little bit. He's the head of data. Science projects for the past couple years. 00:03:51.000 --> 00:03:59.000 He's also the senior senior principal from in quantitative analytics and data science at Gartner, and he graduated from Osu. 00:03:59.000 --> 00:04:09.000 Political science with a Phd. In 2021 Alk does a great job, and he's going to tell you all about like the group project aspect of the boot camp in just a few minutes. 00:04:09.000 --> 00:04:12.000 And then the last person who you, if you haven't contacted with her before, you'll probably be contacting her, you know, if you have troubles like getting set up with communications. 00:04:12.000 --> 00:04:20.000 So her name's Olivia, and this is a typo. 00:04:20.000 --> 00:04:27.000 It should be Olivia, not Olivia, she's a graduate of Ohio state, with a Bachelor of Arts in medical Anthropology. 00:04:27.000 --> 00:04:31.000 She's about to start law school at the University of Virginia in this coming year. 00:04:31.000 --> 00:04:36.000 She's going to be your top contact for slack Channel access, and Github Repository access. 00:04:36.000 --> 00:04:37.000 So remember it's Olivia, not Olivia. 00:04:37.000 --> 00:04:40.000 The slides that I'll upload will have her correct name. 00:04:40.000 --> 00:04:44.000 So just keep her in mind if there's ever like. 00:04:44.000 --> 00:04:51.000 I can't see the slack channel, or I can't see the Github message her. She's the person who's going to help you get set up. 00:04:51.000 --> 00:04:52.000 She does a great job. So as I've been promising, next is Group project information. 00:04:52.000 --> 00:04:59.000 Alex is here, so I wanna give him the time to get everything done so he can go on, you know, with his life. 00:04:59.000 --> 00:05:09.000 So I'm gonna turn over the presentation. Alex, who's gonna talk about the projects? 00:05:09.000 --> 00:05:15.000 Awesome. Let me go ahead and see if I can share my screen. 00:05:15.000 --> 00:05:20.000 And I can alright. Can everybody see this and hear me? 00:05:20.000 --> 00:05:23.000 Great. Okay. 00:05:23.000 --> 00:05:24.000 Alright hi! Everyone as Matt said, my name is Alec. 00:05:24.000 --> 00:05:31.000 I have been kind of working with Aidos for awhile now past few years. 00:05:31.000 --> 00:05:34.000 I was first introduced as a participant in the boot camp myself. 00:05:34.000 --> 00:05:39.000 All the way back in 2020, when that was still Diesel, instructor. 00:05:39.000 --> 00:05:55.000 So everything you're kind of doing right now. I've been through so if you have any other kind of questions, or if there's any help that I can be, please don't hesitate to reach out what I want to do right now is tell you a little bit about the project 00:05:55.000 --> 00:06:04.000 component of this boot camp and kind of specifically focus on what your next steps are for this week. 00:06:04.000 --> 00:06:24.000 So, as Matt said, though if anybody has questions feel free to reach out on slack and I'll be holding regular office hours as well that you can join so as you should be aware, one of the things that you'll be doing in this is a team project in the way that we kind of approach 00:06:24.000 --> 00:06:32.000 this is that we want this project to be an opportunity to work with real-world data and produce findings in a short time span. 00:06:32.000 --> 00:06:51.000 So the reason for this, as we'll talk about is, it's an excellent resume builder and something you can talk about in interviews, and it's a really great opportunity to actually get insight on the type of work that you may be doing as a data scientist and the kind of 00:06:51.000 --> 00:06:58.000 cool thing here is that you have pretty much full autonomy with your group to choose the direction for your project, both substantively so maybe you're interested in finance. 00:06:58.000 --> 00:07:11.000 Now tech, health policy, ux. We've had projects in all of these fields, and more. 00:07:11.000 --> 00:07:12.000 Andrew also have, lot of autonomy in terms of like technique. 00:07:12.000 --> 00:07:18.000 Are you specifically wanting to focus on image classification or natural natural language processing Time series? 00:07:18.000 --> 00:07:36.000 So on, and so the project is very much so many. You and your group will make your own, and one of the things that we find as you're working with kind of resource here, Aidosh, is that building or portfolio is kind of crucial in this market. 00:07:36.000 --> 00:07:40.000 So this is an opportunity for you to be able to do that. 00:07:40.000 --> 00:07:48.000 Okay, so the for the projects, the end result is, gonna be a portfolio worthy data science project and product. 00:07:48.000 --> 00:07:56.000 This is going to include a 5 min overview video and a presentation annotated github and an executive summary. 00:07:56.000 --> 00:08:05.000 These are kind of the 3 deliverables that your group will be providing at the end of the month what we will do then, is have all projects. 00:08:05.000 --> 00:08:21.000 So all of these things, these 3 things will be reviewed by project judges who are actually kind of current data, scientists from a variety of different fields and practices, and they will score all of these projects. 00:08:21.000 --> 00:08:40.000 The top 5 projects. We will then share the videos of them to the entire boot camp and have a little session, and in our kind of our closing ceremony for the May 2023, boot camp, one of the best things that you can do to kinda understand what the projects 00:08:40.000 --> 00:08:41.000 look like, and I'll mention this again a little bit. 00:08:41.000 --> 00:08:45.000 It's just to go back and look at our past projects. 00:08:45.000 --> 00:09:00.000 So we have a page where you could see every project that's been submitted over the past few sessions you can watch their 5 min videos, check out their githubs and really get a sense of the gamut of projects that I've been conducted okay, so for this week. 00:09:00.000 --> 00:09:06.000 So this is a convinced, you know. This is a month that we're doing this in this week. 00:09:06.000 --> 00:09:10.000 The focus is on team formation and project proposal ideas. 00:09:10.000 --> 00:09:15.000 So we've got hundreds of students from all over the world. 00:09:15.000 --> 00:09:20.000 Some of you might know other attendees from your university or other networks. 00:09:20.000 --> 00:09:22.000 Many of you won't, and that's totally fine. 00:09:22.000 --> 00:09:27.000 I would say the vast majority of people on teams don't know each other. 00:09:27.000 --> 00:09:28.000 Everyone's got different backgrounds and subject areas that could be like your actual field. 00:09:28.000 --> 00:09:34.000 Many of you may have very different experiences with coding. 00:09:34.000 --> 00:09:44.000 That's totally fine. That's expected, and you may have different types of goals when it comes to a data size career, or even just for you're looking to get out this project. 00:09:44.000 --> 00:09:49.000 That's all totally fine, and expected the first thing that I would suggest that you do. 00:09:49.000 --> 00:09:54.000 If you haven't done it already, is, as Matt was saying, make yourself familiar with this page specifically. 00:09:54.000 --> 00:09:59.000 Make sure you go through the project and homework instructions here. 00:09:59.000 --> 00:10:02.000 This should really just say, projects instructions. I can fix that. 00:10:02.000 --> 00:10:04.000 But this is for the spring. 2023. 00:10:04.000 --> 00:10:10.000 We have a possible project list that you can check out some general advice from that. 00:10:10.000 --> 00:10:14.000 We've kind of collected and added to over the years, and then our project database. 00:10:14.000 --> 00:10:26.000 So you can actually go and then see all the past projects that have been been done over the over the years, and to get a sense of what you might also be doing this year. 00:10:26.000 --> 00:10:34.000 Let me go back to here, so make sure you've read everything on there on that page. 00:10:34.000 --> 00:10:39.000 The first thing I would say is, this week we're gonna focus on getting teams together. 00:10:39.000 --> 00:10:42.000 So an ideal team side is gonna meet 3 to 4 people. 00:10:42.000 --> 00:10:46.000 But we will allow teams of 5 as well, so you can actually form a team of 5. 00:10:46.000 --> 00:10:51.000 If that's what you choose. Teams can be formed based on matching substantive interest. 00:10:51.000 --> 00:10:58.000 S. Maybe you're all very interested in finance, or it could be based on methods. 00:10:58.000 --> 00:11:02.000 If you just know, for instance, you really want to learn Nlp again. 00:11:02.000 --> 00:11:14.000 As I mentioned this is, gonna be a portfolio project and the way this should go is you should be in constant contact with your team over this coming weeks to get it to get in a state that'll be presented. 00:11:14.000 --> 00:11:29.000 Okay, so I know. Sorry. Some of this is getting a little repetitive here, but I do want to reiterate that in order to get a certificate from this program you must complete this project from start to finish the project needs to be coded in Python you need to 00:11:29.000 --> 00:11:36.000 have that annotated get a repository, and the executive summary needs to include information on your project results. 00:11:36.000 --> 00:11:41.000 And then applications. And then that last component is gonna be a 5 min presentation. 00:11:41.000 --> 00:11:47.000 So these are all necessary in order to get that certificate. 00:11:47.000 --> 00:11:54.000 What I want to say is for this week you have kind of 3 major deadlines that I wanna go ahead and put on everybody's radar. 00:11:54.000 --> 00:11:55.000 So, may twelfth. There's something wrong with the deadline thing that I need to fix or ask. 00:11:55.000 --> 00:12:08.000 Roman to fix. But this Friday, at 4 30 in the afternoon, we'll announce this again, but again this Friday at 4 30 in the afternoon. 00:12:08.000 --> 00:12:17.000 We're gonna be holding a project pitch day, which is just to bring everybody on Zoom and give an opportunity to kind of go on breakout sessions and meet other folks and try to get a project team together. 00:12:17.000 --> 00:12:24.000 If that's something that you want to do, because that. 00:12:24.000 --> 00:12:31.000 May twelfth. At night. You, if you want to submit a proposal which is not required. 00:12:31.000 --> 00:12:48.000 But if you have an idea for a project that is going to be due that night, so what you want to do is go to the main page, go to submit team proposal to project formation page, and if you click that it'll actually bring you to the team formation, page where you'll click data, science boot 00:12:48.000 --> 00:12:59.000 camp, and you can already see this in teams have started to form. So, for instance, let's say, you know, that you really want to do something on like. 00:12:59.000 --> 00:13:11.000 An image, classification project of like famous paintings to see if you can predict who the painter was, or something what you could do is actually create a team name where it's like artist prediction. 00:13:11.000 --> 00:13:17.000 You could put a description here. 00:13:17.000 --> 00:13:28.000 And then you could create that team. Once you hit, create that team, everybody else in the Boot camp will see that team down here, so they have the opportunity to join. 00:13:28.000 --> 00:13:34.000 So teams will start to form basically as soon as we're done with lecture today. 00:13:34.000 --> 00:13:37.000 And what you can do either is decide that you wanna propose a team. 00:13:37.000 --> 00:13:42.000 Maybe you already know folks or you just know what the topic is that you wanna do, or you could join an existing team. 00:13:42.000 --> 00:14:00.000 So, for instance, we already have a couple of teams that have folks on them, and the idea here is gonna be that by this Friday night midnight all possible project ideas have been proposed and will be up on that page. 00:14:00.000 --> 00:14:06.000 And then we'll get everyone until Sunday night to make sure that everyone has joined one. 00:14:06.000 --> 00:14:14.000 So basically, by next Monday everyone will be in their team and you'll be able to hit the ground running with your projects. 00:14:14.000 --> 00:14:20.000 But keep out an eye out for more information. I'll be posting updates in slack as reminders, and everything. 00:14:20.000 --> 00:14:25.000 Those are the main things I want to cover if there's any immediate questions I'm happy to jump into it, Matt. 00:14:25.000 --> 00:14:26.000 I also don't want to take more time that I already have. 00:14:26.000 --> 00:14:31.000 So you tell me, I can also just answer any questions on slack. 00:14:31.000 --> 00:14:34.000 If anybody has them. 00:14:34.000 --> 00:14:45.000 Sure. Thanks, Alex. How about we do? Maybe 5 to 10 min for questions, and then if there's a lot of them, we can cut it off and move it to slack. 00:14:45.000 --> 00:14:50.000 Great, that sounds awesome. 00:14:50.000 --> 00:14:54.000 I guess there's a bit of a pause. I'll start if that's alright. 00:14:54.000 --> 00:14:56.000 Please. 00:14:56.000 --> 00:15:02.000 So I know that in the past there have been kinda like corporate sponsored projects. 00:15:02.000 --> 00:15:09.000 I wonder if this year there's either something like that, or if maybe you guys have some idea of like how to figure it. 00:15:09.000 --> 00:15:18.000 Like, you know, if we're aiming for some industry or other, how to figure out like what sorts of projects people in that industry would be particularly interested in seeing. 00:15:18.000 --> 00:15:24.000 So, Matt. I don't believe we have any corporate sponsor projects this session. 00:15:24.000 --> 00:15:25.000 Not to my knowledge. Yeah. 00:15:25.000 --> 00:15:31.000 Okay. But that's like, okay, so one of the things that we've so let me answer this in 2 parts. 00:15:31.000 --> 00:15:43.000 So one of the things that we've learned is our strongest projects tend to be those that teams have kind of come up on their own, and it there's a little bit of an ability to have more of a unique project that way. 00:15:43.000 --> 00:15:59.000 These projects can still be very aligned with kind of the goals and expectations of a more corporate or industry job, and I'm glad you brought this up because the last major component of the project that I failed to mention much. 00:15:59.000 --> 00:16:04.000 I'm sorry for not doing it is that every group is going to be assigned a project mentor. 00:16:04.000 --> 00:16:09.000 So what we've done is we've actually have a group of now those that are working in data, science. 00:16:09.000 --> 00:16:13.000 A lot of them are former aerosol themselves, and they won't be assigned to your project, and what they will do is they will be meeting with you ideally once a week. 00:16:13.000 --> 00:16:28.000 But many of them will be more than happy to kind of talk on slack as well and they're going to be providing input on your project as it forms and kind of comes together. 00:16:28.000 --> 00:16:43.000 So they're going to be able to provide that kind of inside data science perspective and let you know from their experience in their position, you know what you can do to strengthen the project or all what might make it more compelling for a non data science audience. 00:16:43.000 --> 00:17:00.000 And so on. We also can. You know, we have a large network of folks that you know, depending on what your project topic ends up being, you can reach out to any of us, and we can try to get you in contact with anybody there will be a lot of ways in which you can continue to get 00:17:00.000 --> 00:17:10.000 insight on the project. If you seek it, does that help? 00:17:10.000 --> 00:17:11.000 Of course. 00:17:11.000 --> 00:17:18.000 Yeah, that's sounds good. Thank you very much. 00:17:18.000 --> 00:17:19.000 I have a question. 00:17:19.000 --> 00:17:20.000 Yeah. 00:17:20.000 --> 00:17:27.000 Concerning the project and the whole executive summary thing. How long's a good executive summary? 00:17:27.000 --> 00:17:28.000 One page. 00:17:28.000 --> 00:17:31.000 Okay. 00:17:31.000 --> 00:17:34.000 8 point font, single space. 00:17:34.000 --> 00:17:38.000 I'm kidding. It's like 12.5 or whatever it's just a one page. 00:17:38.000 --> 00:17:39.000 Yeah. 00:17:39.000 --> 00:17:41.000 Well, you gotta be full the first time around. 00:17:41.000 --> 00:17:46.000 Yup! 00:17:46.000 --> 00:17:47.000 No! 00:17:47.000 --> 00:17:48.000 Thanks. 00:17:48.000 --> 00:17:55.000 And it'd be a project that we're already working on, or to start it to form. 00:17:55.000 --> 00:18:04.000 Yeah, I mean, so I would think that in general we certainly have folks who have done projects that are like aligned with their research interests. 00:18:04.000 --> 00:18:09.000 For example, or maybe it's like the data set for which they are familiar. 00:18:09.000 --> 00:18:26.000 That's fine, but it does need to be kind of an original work that's conducted in this month, and one of the reasons for that is, I think we really want everyone to be able to take away the experience of kind of having to create a project. 00:18:26.000 --> 00:18:28.000 Do the analysis and get it ready for presentation. In a relatively short time. Span. 00:18:28.000 --> 00:18:38.000 Is that something that you might often be doing as a data scientist so if that's something you want to connect with kind of via slack, and you can give me more information. 00:18:38.000 --> 00:18:50.000 I'm more than happy to give input I suspect it's generally fine. But you just want to talk further about that. We absolutely can. 00:18:50.000 --> 00:18:51.000 Awesome. Thank you. 00:18:51.000 --> 00:18:54.000 Yeah. 00:18:54.000 --> 00:19:01.000 So if we have project ideas, do you proposed that we put them into that online thing? 00:19:01.000 --> 00:19:02.000 Yup! 00:19:02.000 --> 00:19:05.000 There you show us before the 4 30 on Friday event. 00:19:05.000 --> 00:19:20.000 Yeah, that's totally fine. So if if you have ideas go ahead and start putting them in there, worst case scenario, no one joins the team or something, or I don't even think by having a team, you you could create a project idea and then leave it. 00:19:20.000 --> 00:19:34.000 Yourself. So the project ideas are kind of the the objects there themselves. 00:19:34.000 --> 00:19:35.000 Okay. 00:19:35.000 --> 00:19:37.000 But we would love to go in and see stuff start forming this week, so it does not all have to wait until Friday. Not everyone's going to be able to wait or make it into the Friday meeting, so as soon as you have an idea code and drop it on there. 00:19:37.000 --> 00:19:38.000 Okay. 00:19:38.000 --> 00:19:44.000 Hi, sorry. Can you locate me to that online place again where they're like? 00:19:44.000 --> 00:19:52.000 Yeah, absolutely. So if you actually let me just share my screen again. 00:19:52.000 --> 00:20:01.000 If you go to the the main page here, if you just scroll all the way down to the Yellow, or, green, I'm not sure deadlines. 00:20:01.000 --> 00:20:04.000 Oh, okay. Okay. 00:20:04.000 --> 00:20:08.000 If you just click the second one. The Submit team proposal to project formation page it'll take you there. 00:20:08.000 --> 00:20:10.000 Okay, okay, okay, got it. Yeah. Yeah. 00:20:10.000 --> 00:20:14.000 You'll just have to click data science boot camp. 00:20:14.000 --> 00:20:15.000 Of course. 00:20:15.000 --> 00:20:19.000 Thanks. 00:20:19.000 --> 00:20:20.000 Oh, I'm sorry. There's some questions in the chat. Okay? 00:20:20.000 --> 00:20:25.000 So we have a general area. We want our project to be in, but don't have a specific project in mind. 00:20:25.000 --> 00:20:30.000 Should the next step be to 5 people via slack, who wants to work in a similar area? 00:20:30.000 --> 00:20:42.000 Yes, so that's something that I highly encourage folks to you can go into the general slack chat for Spring twenty-twenty 3, and just say, Hey, I'm hoping to do a project on this general area. Feel free. 00:20:42.000 --> 00:20:48.000 To respond, and thread or reach out to me at this, that this sounds interesting to you definitely. 00:20:48.000 --> 00:20:53.000 Leverage slack to your advantage, and never hesitate to kind of put out messages. 00:20:53.000 --> 00:21:00.000 There! You do not need to have a data set ready there's a lot of ways in which we can find one. 00:21:00.000 --> 00:21:02.000 You could look at things like it's cagle right? 00:21:02.000 --> 00:21:07.000 I believe that has a lot of kind of data sets ready that you could use, or it could be. 00:21:07.000 --> 00:21:13.000 You know, part of the data Cl is part of the project. 00:21:13.000 --> 00:21:18.000 Someone asked, who we use machine learning methods that are not on class. Matt. 00:21:18.000 --> 00:21:22.000 Yes, as you mentioned, that is totally fine. And then we have some other non questions. 00:21:22.000 --> 00:21:28.000 Nonproject questions coming up, so I will defer to you on those that. 00:21:28.000 --> 00:21:32.000 Yeah, so the recordings I'll try and have them posted by. 00:21:32.000 --> 00:21:45.000 You know, the day after. It just depends like how I feel after the recording's done. 00:21:45.000 --> 00:21:46.000 Yeah. 00:21:46.000 --> 00:22:03.000 I have a question. If we have a topic in mind, is it best to choose topics that you think you could find a data set for like a particular like, hey, interest? And then, knowing that you can find maybe a data set for that? 00:22:03.000 --> 00:22:04.000 Totally fine! 00:22:04.000 --> 00:22:18.000 Then get on the slack channel. Sorry. Have a baby. You don't get on the slack we don't want a slack channel and Garner people who are interesting because because maybe I don't have a specific idea what the idea could come together. 00:22:18.000 --> 00:22:22.000 If I have a topic, people and data set. That's true. Right? 00:22:22.000 --> 00:22:27.000 Yeah, I would say, in general, it is never. It's never too early to go ahead. 00:22:27.000 --> 00:22:44.000 And just if you have even just an inkling as to the type of direction you want to take a project, I think it's totally fine to go in and put that in slack, you can connect with somebody you all can connect then on like you know, finding datasets all the way to if you already 00:22:44.000 --> 00:22:47.000 have one in mind. So that's totally fine, I would say. 00:22:47.000 --> 00:22:55.000 Just in general, though the data collection part of the project should not be too time-consuming. 00:22:55.000 --> 00:23:06.000 At the end of the day. So if you find yourself thinking that the data collection or getting a data set is going to be particularly difficult, or particularly time consuming, send me a slack message, we'll meet. 00:23:06.000 --> 00:23:10.000 We'll talk and I can kinda let you know if we think that's it. 00:23:10.000 --> 00:23:16.000 Within the realm of normal time, expectations. 00:23:16.000 --> 00:23:18.000 Sounds good. Thank you. 00:23:18.000 --> 00:23:30.000 Quick, follow-up question to that is, I noticed that in slack there are a number of channels that already exist for various different related projects should we be joining the existing channels when they are available? 00:23:30.000 --> 00:23:36.000 Or should we be trying to create new channels for this particular covid? 00:23:36.000 --> 00:23:41.000 Yes, so I would say, no need to join past channels. 00:23:41.000 --> 00:23:44.000 We have entirely too many. So what we have done in the past is, I will typically just like make channels. 00:23:44.000 --> 00:23:50.000 But given that everyone's gonna be forming teams this way. 00:23:50.000 --> 00:23:53.000 If you have a team, and it's solid, go ahead and create a slack channel. 00:23:53.000 --> 00:23:54.000 That's totally fine. My request would be to loop me in. 00:23:54.000 --> 00:24:13.000 So if you create a slack channel with group members, if you could just add me on that as well, that would be great. 00:24:13.000 --> 00:24:18.000 Is synthetic data allowed. 00:24:18.000 --> 00:24:25.000 That's a good question. 00:24:25.000 --> 00:24:28.000 So in the past we've had synthetic data. 00:24:28.000 --> 00:24:29.000 That was like provided as part of like a corporate partner. 00:24:29.000 --> 00:24:37.000 Submitted problem. So people have worked on that. And that's been fine cause like it's been submitted by somebody. 00:24:37.000 --> 00:24:48.000 I don't think we have that this year. I would advise like I would say, that you probably want to stick to like a real world dataset, even in the past, when we've had like industry partners. 00:24:48.000 --> 00:24:58.000 Submit synthetic data sets like those projects like outside of that particular company, like, they don't go over well with the judges, because, like they're synthetic. 00:24:58.000 --> 00:25:03.000 So they're a little bit easier than real world data, because you don't have to do as much cleaning or the cleaning. 00:25:03.000 --> 00:25:07.000 You do have to do is very like it's generated. 00:25:07.000 --> 00:25:12.000 So it's a little bit easier to do than like cleaning real world data, which is pretty messy. 00:25:12.000 --> 00:25:34.000 So I would advise to stay away from synthetic data unless it's like a part of some weird Google challenge or something on. 00:25:34.000 --> 00:25:37.000 Great. Well, I will see me like there's no other questions. 00:25:37.000 --> 00:25:48.000 Let me just end with this, that again. I really want to reiterate that we expect everyone to come in with different experiences, with coding. 00:25:48.000 --> 00:25:53.000 Any sort of statistics, training all that, so this might be your first time setting up python. 00:25:53.000 --> 00:25:59.000 I know Matt, you're about to say all this, but if this project suddenly starts to feel like it's overwhelming, please. 00:25:59.000 --> 00:26:02.000 I encourage you to reach out. I've been in that position. 00:26:02.000 --> 00:26:13.000 I wanna make sure that we're setting kind of reasonable expectations on this and also finding your group that you feel like you know is coming it's having similar experiences to you. 00:26:13.000 --> 00:26:23.000 That's all totally fine. I will say that this month I'm a little bit busier than normal with my own work. 00:26:23.000 --> 00:26:24.000 So message me anytime on slack I will be most responsive. 00:26:24.000 --> 00:26:36.000 Or I might just follow backup with you. Basically, after work hours, or pretty early in the morning, so if you message me in midday and I'm not getting back to you immediately, that's why. 00:26:36.000 --> 00:26:41.000 But I I will get back to you as soon as possible, but otherwise really looking forward to looking forward to meeting everyone. 00:26:41.000 --> 00:26:45.000 The projects are always awesome, and I can't wait. 00:26:45.000 --> 00:26:51.000 Thanks, Alex. Go enjoy the rest of your evening. 00:26:51.000 --> 00:26:55.000 Okay, so that's that. And then don't worry. 00:26:55.000 --> 00:27:02.000 I will be posting. Alex Slides, you know he's already sent them over to me, so I'll be uploading those with the video. 00:27:02.000 --> 00:27:07.000 So the rest of the boot camp is what we're going to talk about now, and this is going to include like just anything that's not related to the project. 00:27:07.000 --> 00:27:19.000 So here's sort of the format is we're gonna have after today, 11 live lectures and 11 problem solving sessions. 00:27:19.000 --> 00:27:36.000 We're going to talk about each of those also follow up that all zoom links can be found either in your errors, profile, or on the course website which you must have figured out because you're here, and then the other thing that's really important. 00:27:36.000 --> 00:27:46.000 Is, if you're interested in what we're doing, or you want to see like a Pdf version of the schedule, maybe you want to print it out for some reason there are Pdfs of both the syllabus of like the topics. 00:27:46.000 --> 00:27:47.000 That get covered in all the lecture content as well as the schedule which can be found on the course website. 00:27:47.000 --> 00:27:52.000 And I've sort of like pointed out what they look like on that website. 00:27:52.000 --> 00:28:03.000 They're just these little file icons, and once the syllabus, and once says schedule, so first let's talk about the lectures, and so I'll give you a brief rundown of what they are, and then we'll pause, and if you have questions about how lectures. 00:28:03.000 --> 00:28:15.000 Will work moving forward, feel free to ask. So there's going to be a live lecture every Monday, Tuesday, Wednesday, Thursday, until May twenty-fifth, and I believe I got that date right. 00:28:15.000 --> 00:28:19.000 These are gonna be live like you're in right now. 00:28:19.000 --> 00:28:23.000 But, as I said, they're gonna be recorded, and I'm uploaded. 00:28:23.000 --> 00:28:25.000 And, as I said earlier, I'll try and do that as quickly as I can. 00:28:25.000 --> 00:28:35.000 But sometimes, if there's a particularly strenuous lecture, I might be tired and want to go. Sit on my couch and not uploaded, and, as I said earlier, I'll try and do that as quickly as I can. 00:28:35.000 --> 00:28:38.000 But sometimes, if there's a particular particularly strenuous lecture, I might be tired and want to go sit on my couch and not upload the video. 00:28:38.000 --> 00:28:40.000 But always try and have it done by the next day. That being said like, these are 11 lectures, and if you've gone through like before we started today and looked at some of the content, there's a lot of content. 00:28:40.000 --> 00:28:45.000 It's not feasible for us to cover everything. 00:28:45.000 --> 00:28:51.000 I've made lecture content on in the liveliectures in the past. 00:28:51.000 --> 00:28:55.000 These have mostly been like we go through the core stuff. That's really important. 00:28:55.000 --> 00:29:07.000 And then also serves as like a place for people to ask questions, to get sort of the basics down, that being said, there's also like pre-recorded lectures for every single Jupiter notebook. 00:29:07.000 --> 00:29:14.000 So there's something like 70 Jupiter notebooks, or maybe 60 or 70 Jupiter notebooks, or maybe 60 to 70 Jupper notebooks. 00:29:14.000 --> 00:29:28.000 All of them already have a lecture video of me going through them encoding and commenting on the notes so if there's something that you wish we were gonna talk that we would have talked about in live lecture, but we didn't there is a video of it that you can go through on your own time, and then if 00:29:28.000 --> 00:29:34.000 you have questions you can always message me on slack and I'll do my best to answer through there, or you can come to. 00:29:34.000 --> 00:29:35.000 I will be having live office hours at some point during the boot camp. 00:29:35.000 --> 00:29:44.000 I haven't scheduled them yet, but you can come to those and ask me questions, and I'll do my best to answer. 00:29:44.000 --> 00:29:59.000 So before talking about problem sessions. Are there any questions about the lectures? 00:29:59.000 --> 00:30:00.000 I do see a question about the projects. So thanks for all questions. 00:30:00.000 --> 00:30:08.000 Sorry I just got in. My question. Is there grading reboks for the final project? So there are. 00:30:08.000 --> 00:30:16.000 We typically judge on creativity, textical difficulty, I think one more thing, and then teamwork. 00:30:16.000 --> 00:30:21.000 I we found that judging teamwork is a little bit difficult, and this sort of asynchronous format. 00:30:21.000 --> 00:30:27.000 So we may be tweaking that this year, but we haven't talked about it yet, but those are sort of the main things, and sort of just give like an overall grade. 00:30:27.000 --> 00:30:34.000 I would say like in my perspective, one of the things that weighs heaviest on your performance is like, how well your presentation goes. 00:30:34.000 --> 00:30:39.000 So like the little 5 min thing that you're gonna upload. 00:30:39.000 --> 00:30:42.000 So don't save that until the very end, and rush through it like. 00:30:42.000 --> 00:30:53.000 Give that some thought and time as well, because you do all this stuff, and then these judges are buzzing busy people, so they're not gonna have time to go through everything so they may just look at the video and then sort of like glance through the rest. 00:30:53.000 --> 00:31:07.000 Of the stuff. So if your video is really good, then you'll probably get a better score than if you, you know, spend all your time on the project, and then very little time on the video, okay, are there questions about the lectures I guess that wasn't about the lectures? 00:31:07.000 --> 00:31:13.000 But are there questions about the lectures? 00:31:13.000 --> 00:31:14.000 Yeah. 00:31:14.000 --> 00:31:20.000 Sorry I actually have a question you said that there's 70 live lectures on the Repository, so I was wondering, like, if we have to do them all to like be successful in the course or like what is like the right balance. 00:31:20.000 --> 00:31:26.000 You know. 00:31:26.000 --> 00:31:31.000 So I would say that we're gonna go through the most important stuff in the live lectures. 00:31:31.000 --> 00:31:40.000 Other things are like we're not gonna be able to cover the entirety of what I thought was important to know. For linear regression. 00:31:40.000 --> 00:31:43.000 But maybe you're not interested in linear regression like, maybe you want to work in roles or do projects that focus on something like classification. 00:31:43.000 --> 00:31:49.000 And so you might want to watch those videos that were unable to cover on classification as opposed to watching all the ones on regression. 00:31:49.000 --> 00:31:52.000 Or there's a couple new videos this year that I've added on time series forecasting. 00:31:52.000 --> 00:32:01.000 Maybe are interested in roles that focus on forecasting. 00:32:01.000 --> 00:32:03.000 And you want to watch these additional ones that we're not gonna have time to cover and live lecture. 00:32:03.000 --> 00:32:12.000 But you know you wanna know everything you want so sort of like I'm doing my best to give you like the very like base of like you will maybe go to a data science interview. 00:32:12.000 --> 00:32:24.000 And one of the questions like, I've had an interview, or one of the questions is basically just like what's this algorithm? 00:32:24.000 --> 00:32:26.000 How does it work like? What's the assumptions? 00:32:26.000 --> 00:32:37.000 And so like. My goal is so that you, if you come to these live lectures, will, understand the basics of the and so like. My goal is so that you, if you come to these live lectures, will understand the basics of like the ones that people ask about and then the ones that i'm unable to 00:32:37.000 --> 00:32:39.000 cover are maybe a little bit deeper than what they might expect you to know as a base. 00:32:39.000 --> 00:32:43.000 So, then. 00:32:43.000 --> 00:32:47.000 There is also a question from Nathan. I have a question. 00:32:47.000 --> 00:32:52.000 Okay, looks like that, was answered. But Nathan's question was, I have a question about notebook setup. 00:32:52.000 --> 00:32:53.000 Is there a place where the notebooks that are used for each lecture written down? 00:32:53.000 --> 00:33:04.000 Yes, so I'm going to be getting to that. But as Kirthen answered, Everything's on the Github page, Yup. 00:33:04.000 --> 00:33:09.000 Okay. Any other lecture questions? 00:33:09.000 --> 00:33:20.000 And kinda in the vein of what was just asked previously, is so it seems like there's a baseline of, you know, if you don't even know what linear regression is okay, it's a bit of a problem, and then beyond that there's like extra stuff that's 00:33:20.000 --> 00:33:24.000 good to know. Is there like much of an indicator, slash? 00:33:24.000 --> 00:33:35.000 Could we ask Staff in the program about opinions to the effect of oh, if you're working in this particular industry, they might care more about like Time series versus the versus that. 00:33:35.000 --> 00:33:39.000 So job post things will typically tell you like what they expect you to know. 00:33:39.000 --> 00:33:48.000 So like if it was a role where you're going to be doing a lot of time settings they'd usually would want you to know that coming in so they would put it in their job posting. 00:33:48.000 --> 00:33:57.000 So for any particular job like you know, that's where you're gonna want to look to see what things you're expected to know at the end of the day. 00:33:57.000 --> 00:33:58.000 Like, like, if you're interested in certain types of problems, there are jobs that exist that work solely with that problem. 00:33:58.000 --> 00:34:09.000 And then you kind of just need to go through and like, look through the job postings and find the ones that do that. 00:34:09.000 --> 00:34:18.000 So like, if you're someone that's interested in doing like cutting edge neural network to things for like video or for audio or for images. 00:34:18.000 --> 00:34:28.000 And you might lean more towards spending your time learning that stuff and then applying to those types of jobs when they pop up like I've seen job postings for Tiktok. 00:34:28.000 --> 00:34:40.000 That explicitly asked for that kind of experience. I've also seen like business analysts, jobs where, like they specifically asked for things like linear regression and data and analysis tools and that sort of thing. 00:34:40.000 --> 00:34:45.000 So I would say, like, just find the types of problems that you're interested in working on. 00:34:45.000 --> 00:34:55.000 Learn, those skills. And then, like, once you've mastered those, and you think you have time to learn other things, you can branch out and learn a additional algorithms as you'd like. 00:34:55.000 --> 00:35:00.000 So, Stevens asking, Would you recommend working through the content before the live lecture? 00:35:00.000 --> 00:35:08.000 So that's entirely up to you. So some people learn best by like actually coming and watching the lecture and trying to consume it in life. 00:35:08.000 --> 00:35:14.000 Some people are invested by trying to, you know, watch the video on their own first, then come back and see it for a second time. 00:35:14.000 --> 00:35:25.000 So, Stephen, if that's what you're if that's what works best for you, then I would try to do that sometimes we're going to be covering a lot of content, and some of my videos are kind of long and the pre recorded ones. 00:35:25.000 --> 00:35:40.000 So I I understand. If you're unable to watch it ahead of time, I think you just gotta know what works best for your learning and do your best to do that, you know, with the time that you have available like you know, I kind of like what Alex said for the project the project, and then I think combined with the 00:35:40.000 --> 00:35:55.000 lectures, the problem sessions. It can all be a lot, and maybe you only allocated so much time for the boot camp, which is perfectly fine, because a lot of you are Phds and master students that are still trying to finish course work and research like do your best to finish what you can finish and just do it. 00:35:55.000 --> 00:36:02.000 To the best of your ability. That's my opinion, you know. 00:36:02.000 --> 00:36:10.000 Alright! Are there any other questions about lecture? 00:36:10.000 --> 00:36:21.000 Sorry I have a question about the. He recorded videos and the direction to those videos. 00:36:21.000 --> 00:36:26.000 If you can, please show us. 00:36:26.000 --> 00:36:33.000 The preparation materials, I mean, for the both camp. 00:36:33.000 --> 00:36:34.000 Uhhuh. 00:36:34.000 --> 00:36:42.000 So so all the videos are available on the website, like, if you click on the website link. 00:36:42.000 --> 00:36:43.000 Bye! 00:36:43.000 --> 00:36:54.000 If you go here, like, all the videos are available, like under program, content. 00:36:54.000 --> 00:36:56.000 Okay. Yeah. 00:36:56.000 --> 00:36:57.000 Yeah, I'm here. Okay. 00:36:57.000 --> 00:37:01.000 So like here, these are all the videos. You just find the notebook that you're looking for, like, it'll be in one of the names and then you'll click play, for in terms of like the notebooks themselves, we'll see what those look like a little bit later. 00:37:01.000 --> 00:37:07.000 You know, when we talk about the repository, and then Jupiter notebooks. 00:37:07.000 --> 00:37:15.000 If you're interested in the prerequisite stuff you have to, you'll click on first steps here, and then that will take you to all the python prep stuff. 00:37:15.000 --> 00:37:20.000 Yeah, so the program contains videos that you have already uploaded. 00:37:20.000 --> 00:37:21.000 Yeah. 00:37:21.000 --> 00:37:30.000 I see how many of? 00:37:30.000 --> 00:37:31.000 No. So if you click on load more videos, that loads more videos. 00:37:31.000 --> 00:37:32.000 Nearly. 18. Videos. Right? No. More. Yeah. 00:37:32.000 --> 00:37:36.000 Yup! 00:37:36.000 --> 00:37:43.000 Okay. So we gotta watch all of them, probably, and be more ready or. 00:37:43.000 --> 00:37:44.000 I mean it's up to you if you wanna watch all of them. 00:37:44.000 --> 00:37:45.000 Yeah. 00:37:45.000 --> 00:37:51.000 Then watch all of them. If you just feel like you're comfortable doing the live lectures, just do the live lectures. 00:37:51.000 --> 00:37:53.000 Just do what works for you. 00:37:53.000 --> 00:37:58.000 Okay. So the live live lecture kite line will be different. 00:37:58.000 --> 00:37:59.000 Right. 00:37:59.000 --> 00:38:06.000 So all the content's gonna be the same. It's just I'm not gonna have time to cover every single thing just because some of it dives a little bit deeper. 00:38:06.000 --> 00:38:11.000 Some of it is presenting things that aren't like the core level content. 00:38:11.000 --> 00:38:12.000 Right. 00:38:12.000 --> 00:38:15.000 This is like newer stuff. That is Dean developed. 00:38:15.000 --> 00:38:16.000 Right. Thank you. 00:38:16.000 --> 00:38:19.000 Yup! Yup! 00:38:19.000 --> 00:38:31.000 Okay. So the next thing, our problem session. So these are 1 h sessions where there are going to be problem sets that cover the things that are talked about in lecture. 00:38:31.000 --> 00:38:36.000 So, for instance, tomorrow there is a problem session. That's our first one it's not gonna be talking about data, science. 00:38:36.000 --> 00:38:51.000 The first thing is just checking that everybody in your small group is all set up to go with python and Jupiter notebooks, and then the rest is just getting you ready to work in Python by looking at like basic Python skills that you could work on together to get 00:38:51.000 --> 00:38:54.000 a sense of what it's like to work on these problems. 00:38:54.000 --> 00:38:55.000 These are going to be every Monday, Tuesday, Wednesday, thirds. 00:38:55.000 --> 00:39:01.000 Starting tomorrow. The ninth going all the way to our last day. 00:39:01.000 --> 00:39:04.000 So there are 2 sessions, but they're identical. 00:39:04.000 --> 00:39:06.000 This is to accommodate schedules, because some people maybe are available in the mid to late afternoon, but not in the morning, and vice versa. 00:39:06.000 --> 00:39:16.000 So every day that we have a problem session. There's one from 10 to 11 Am. 00:39:16.000 --> 00:39:19.000 Eastern time, and then also one from 4 to 5 Pm. 00:39:19.000 --> 00:39:22.000 Eastern time. These are going to cover the exact same content. 00:39:22.000 --> 00:39:26.000 The only thing that's different is we'll have different Tas. 00:39:26.000 --> 00:39:27.000 But the the notebooks will be the same. 00:39:27.000 --> 00:39:28.000 The things that are talked about are the same. Just go to one of them. 00:39:28.000 --> 00:39:50.000 Whatever works best in your schedule. So if you're someone who can't come to the morning and go to the afternoon based on the free survey that I sent out last week, I'm expecting that we'll have about a 50 50 split here 50 50 split and so like half in 00:39:50.000 --> 00:39:56.000 the morning, half in the evening. If it turns out not to be that way, I might move some of the tas around. 00:39:56.000 --> 00:40:05.000 I'll keep an eye on it this week. So while you're in a problem session, you're going to be rotating between tas will be there to rotate between these groups. 00:40:05.000 --> 00:40:06.000 So basically the problem sessions are on zoom, you'll just be split up into a breakout room. 00:40:06.000 --> 00:40:18.000 There'll be randomly assigned groups each time of like 4 to 6 people, and then the tas will be there, and there might not be one in your room. 00:40:18.000 --> 00:40:34.000 But if you have a question you can try and flag one of us on the slack, or something like that, or just try and move on and wait until ta gets their tas will rotate between these small groups, answer your direct questions and then if you're stuck on something, try and provide like guiding questions, of 00:40:34.000 --> 00:40:38.000 oh, well, have you thought about trying this approach? That sort of thing? 00:40:38.000 --> 00:40:39.000 So these tas are really just there to help you get through it. 00:40:39.000 --> 00:40:41.000 They're not going to give you all the answers. 00:40:41.000 --> 00:40:46.000 They're not going to do the exercises for you. 00:40:46.000 --> 00:40:47.000 The goal of the problem sessions is not going to do the exercises for you. The goal of the problem sessions is to sort of test your working knowledge. 00:40:47.000 --> 00:40:54.000 And then you guys working together on these problems to figure out how to do things. 00:40:54.000 --> 00:41:00.000 These things go best when people are talking so I know in the past I've seen people in problem sessions, and everybody's really quiet and it's really just one person doing it all. 00:41:00.000 --> 00:41:09.000 Everybody else watches silently. That is not a fun experience so like it works well. 00:41:09.000 --> 00:41:14.000 When you guys are talking to each other brainstorming ideas, taking turns coding. 00:41:14.000 --> 00:41:20.000 So everybody gets a chance to practice. In addition to the problem sessions, there's also what are known as prep notebooks, and we'll see these in a little bit. 00:41:20.000 --> 00:41:26.000 These are just every problem. Session has a prep notebook. 00:41:26.000 --> 00:41:35.000 These are entirely optional. Some people found it nice to have a notebook that practices the python stuff that will show up in that practice session. 00:41:35.000 --> 00:41:42.000 So, for instance, if I were to expect like 4 loops are gonna happen a lot in today's problem settings, the Prep notebook would have a couple questions for you to practice your for loops. 00:41:42.000 --> 00:41:51.000 These are optional. The solutions are provided, and you can check your work again. 00:41:51.000 --> 00:41:56.000 Optional you don't have to do it. Problem sessions start tomorrow. 00:41:56.000 --> 00:42:06.000 Are there any questions about the problem sessions? 00:42:06.000 --> 00:42:11.000 Do we know the topics of problem sessions in advance or no? 00:42:11.000 --> 00:42:17.000 The problem session topics are always like what was covered the previous day in lecture. So since we're not covering data science today, tomorrow's problem, session is about setup and just basic python stuff. 00:42:17.000 --> 00:42:25.000 Then tomorrow we'll be covering data collection in lecture. 00:42:25.000 --> 00:42:26.000 So Wednesday's problem sessions will be about data collection. 00:42:26.000 --> 00:42:30.000 And it works like that. 00:42:30.000 --> 00:42:32.000 Great. Thank you. 00:42:32.000 --> 00:42:35.000 Yup, and what's being covered in a particular problem? 00:42:35.000 --> 00:42:45.000 Session can be found in that schedule. Pdf, on on the website. 00:42:45.000 --> 00:42:46.000 Yeah. 00:42:46.000 --> 00:42:47.000 I have a question, is it okay to like to do? 00:42:47.000 --> 00:42:54.000 Say, maybe the morning session on one day, maybe the evening session on another day. 00:42:54.000 --> 00:43:02.000 Yeah. So if you had something like, you know, one person brought up last week, you know I teach on Tuesdays and Thursdays, but not Mondays and Wednesdays. 00:43:02.000 --> 00:43:07.000 So if you have something like that, it's okay to switch between, just go with what? What works with your schedule. 00:43:07.000 --> 00:43:15.000 If you're typically going to be like what what works with your schedule? If you're typically going to be like, let's say you're available at the Am settingsession Monday, Tuesday. Wednesday, Thursday try to go to that one consistently, but if you're 00:43:15.000 --> 00:43:17.000 schedule makes it so that you have to rotate. 00:43:17.000 --> 00:43:20.000 That's perfectly fine. 00:43:20.000 --> 00:43:21.000 Yup! 00:43:21.000 --> 00:43:25.000 Thanks. 00:43:25.000 --> 00:43:32.000 So do we need to decide which sessions you are supposed to, or we are going to attend in advance, or we we can go same day. 00:43:32.000 --> 00:43:38.000 Well, you'll need to decide. You'll need to decide before tomorrow, probably right? 00:43:38.000 --> 00:43:39.000 Yeah, okay. 00:43:39.000 --> 00:43:43.000 Cause. That's when they start. But other, yeah, after that, like, you know, just go with what we're in your schedule. 00:43:43.000 --> 00:43:45.000 It'll be easiest for us on the Institute side if you go to the same one every time, because then we can plan. 00:43:45.000 --> 00:43:51.000 Okay, we expect to have this many people there, but we are understanding that schedules change for various reasons. 00:43:51.000 --> 00:43:57.000 And so, you know, just do what you're able to do. 00:43:57.000 --> 00:44:00.000 That, I mean, do we need to vote somewhere that which session we are going to attend? No. 00:44:00.000 --> 00:44:15.000 No, Nope, there's not an assigned thing, just, you know, doing what works best with your, and we we got a sense of what the attendance would be from a survey I sent out last week, so I tried my best to assign the tas according to that survey but we'll 00:44:15.000 --> 00:44:25.000 adjust. If we need to. 00:44:25.000 --> 00:44:34.000 Okay. Any other questions. 00:44:34.000 --> 00:44:39.000 Okay. So the last bit, we're gonna talk about is making sure we're all set up and ready to go for python and data science. 00:44:39.000 --> 00:44:49.000 So there are 2 things that need to be done. You need to clone the repository onto your computer any need to be able to open a Jupiter notebook. 00:44:49.000 --> 00:44:50.000 So the Github Repository, the link for this can be found on the course website. 00:44:50.000 --> 00:45:05.000 I'll show you that in a second it will look like this that when you're able to access it, and then it will contain all the educational content for the boot camp. 00:45:05.000 --> 00:45:13.000 So I'm gonna hit escape. So on the website, the repository link can be found here under program contents. 00:45:13.000 --> 00:45:16.000 So if you click on this, I'm not okay. 00:45:16.000 --> 00:45:26.000 So I'm signed in right now, so it will look like this when you click on it, and you can see it has things like lectures, practice problems which are sort of like, just for on your own time and problem sessions. 00:45:26.000 --> 00:45:33.000 So a big thing that I think people have missed out in previous years is every one of these folders has what's known as a readme file, and so the read me file tells you everything you need to know about the repository so here. 00:45:33.000 --> 00:45:48.000 The readme file shows you what all the folders are, but what could be useful as in the lectures folder, if I remember correctly, that read me, file has a suggested notebook order. 00:45:48.000 --> 00:46:00.000 So if you're somebody who's watching this video asynchronously and trying to figure out what to go through, this is sort of like the suggested order of all of the notebooks like, maybe you're reading it like a book or something like that. 00:46:00.000 --> 00:46:08.000 So you know, this can be useful, and I think in the past, I've updated like with what we're able to cover and I'll probably keep doing that. 00:46:08.000 --> 00:46:15.000 So whatever we're able to cover, and the lectures as we go along I try and do my best to put it in in the readme Files. 00:46:15.000 --> 00:46:20.000 So this is the Github Repository. You can find it here. 00:46:20.000 --> 00:46:26.000 You click on this link. The key thing is, you need to be signed into your Github account. 00:46:26.000 --> 00:46:29.000 In whatever web browser you're using. So let's go back to the slide. 00:46:29.000 --> 00:46:35.000 Show, so how do you get access to the Giftub Repository? 00:46:35.000 --> 00:46:39.000 If you haven't already, or these are the steps. So step one. 00:46:39.000 --> 00:46:45.000 Whatever web browser you use, you need to sign into your Github account, and I guess this would technically be step 2. 00:46:45.000 --> 00:46:47.000 You need to go to the link that I just showed you. 00:46:47.000 --> 00:46:53.000 Then you can clone the repository on your computer. 00:46:53.000 --> 00:46:57.000 I'm not gonna show you how to do this, but there are instructions for this. 00:46:57.000 --> 00:47:12.000 On the first steps section of the website. So once you've cloned the repository on your computer you'll be able to access it and work through the notebooks like you're gonna see me do in the lectures every day of the boot camp before you start a 00:47:12.000 --> 00:47:20.000 lecture or the problem session. You're also going to need to pull the updates so I'm going to be updating the repository meaning that I'm going to add files to it. 00:47:20.000 --> 00:47:29.000 Every day. So these files are going to be the new problem sessions as well as the lecture notebooks that we completed. 00:47:29.000 --> 00:47:41.000 I'm going to upload my version that I complete live in lecture so like, for instance, somebody may ask a question, and as a way to answer that question, I may demonstrate by typing up some python code. 00:47:41.000 --> 00:47:46.000 So that would be, and like the live version of the notebook that I'd be updating. 00:47:46.000 --> 00:48:05.000 So everything. I update the repository. You then need to pull those updates into your version of the repository, and there are instructions for that in the file getting started with Github in the first step section of the website then this is something you don't necessarily have to do but 00:48:05.000 --> 00:48:20.000 I think everybody in the past has found it useful to do so once you have the repository, like your cloned version, it's useful to make a copy of it in your folder for your edit, so you're not going to be able to update any changes you make because you only have what's known as 00:48:20.000 --> 00:48:25.000 read access meaning. You can just clone the files and then look at them and make your edits. 00:48:25.000 --> 00:48:34.000 But to make sure that you're not getting any of your files accidentally overwritten by any polls you do, you may want to make a copy of the folder on your computer. 00:48:34.000 --> 00:48:39.000 Just something like code. Dash 2023 space copy. 00:48:39.000 --> 00:48:44.000 Make all your edits in that copied version, and then keep the other version as like. 00:48:44.000 --> 00:48:54.000 This is what the Github currently is online. So one issue you may encounter is what's known as a 404 issue. 00:48:54.000 --> 00:48:55.000 So when you go to the link, you may see something like this. 00:48:55.000 --> 00:49:02.000 So if you see something like this, there's 3 steps that you should take in order. 00:49:02.000 --> 00:49:08.000 First check that you're signed into your Github account when you click on the Link Github account when you click on the link. 00:49:08.000 --> 00:49:17.000 Github is really nice about this, and that they will like prompt you to be like you might need to sign in when you see this you might need to sign in. When you see this. The second thing is, if you are signed in, and you're still seeing the 4. 00:49:17.000 --> 00:49:26.000 O 4 er the second thing is, if you are signed in, and you're still seeing the 404 error check that you have added your Github link to your Erdos profile, and then, after that message, Olivia, about being added to their repository, so, Olivia 00:49:26.000 --> 00:49:29.000 is in charge of adding everybody to the repository. 00:49:29.000 --> 00:49:32.000 She's done her best to do that so far leading up to the boot camp. 00:49:32.000 --> 00:49:35.000 If you're somebody who hasn't yet been added to the repository, you need to message Olivia on slack and show. 00:49:35.000 --> 00:49:43.000 Make sure that you get access to the Repository. 00:49:43.000 --> 00:49:45.000 Okay, so before we go into Jupiter notebooks, are there any questions about sort of the Github Repository? 00:49:45.000 --> 00:50:07.000 Just as a concept. Please hold any like I'm having trouble getting access issues until we're all the way done with the slides, and then I'll stick around and help people resolve that issue after we're done going through the slides. 00:50:07.000 --> 00:50:08.000 So how often should we pull updates from the repository? 00:50:08.000 --> 00:50:10.000 Do you suggest? 00:50:10.000 --> 00:50:16.000 I would say every day of the boot camp. So before your problem session, so I don't know that I've added problem session one yet. 00:50:16.000 --> 00:50:25.000 I'll add it tonight or tomorrow morning. Pull the the repository to get that. So every day of the boot camp. 00:50:25.000 --> 00:50:39.000 Okay. 00:50:39.000 --> 00:50:50.000 Any other. Github related questions. 00:50:50.000 --> 00:50:56.000 Okay, great. So the next thing I said you need to be able to do is be able to open a Jupiter notebook. 00:50:56.000 --> 00:51:01.000 So all of the educational content is typed up in a Jupiter notebook. 00:51:01.000 --> 00:51:09.000 So if you haven't worked with these before, they're really great, they allow you to type sort of a combination of word, document, style, text and python code. 00:51:09.000 --> 00:51:17.000 So what? You're gonna go ahead and look at an example. Now. 00:51:17.000 --> 00:51:24.000 So this is what the repository looks like when you've cloned it onto your computer and then open it in a Jupiter environment. 00:51:24.000 --> 00:51:26.000 And now that I've opened it in a Jupiter environment, I can go into the lectures and then open a Jupiter notebook. 00:51:26.000 --> 00:51:40.000 So, for instance, let's start with introduction. So if you are consuming this asynchronously, you may be you'd watch the welcome video first. 00:51:40.000 --> 00:51:41.000 That gives you sort of the same rundown that I'm giving you right now. 00:51:41.000 --> 00:51:43.000 Okay, so this is a Jupiter notebook. You can see here that it's got all this nice written text. 00:51:43.000 --> 00:51:56.000 But I can also I I can also do something like run python code, like 2 plus 2. 00:51:56.000 --> 00:52:01.000 And you can see that once it runs it will say 4. 00:52:01.000 --> 00:52:09.000 I could also do things like print things like, Hello, World. And it prints, Hello, world! 00:52:09.000 --> 00:52:15.000 So this is a Jupiter. Notebooks are nice, and that they let you do both writing like. 00:52:15.000 --> 00:52:22.000 So this can be if you're doing like a report for like a business colleague or something, you can write explanations and just word styled chunks. 00:52:22.000 --> 00:52:31.000 And then run code and sort of code chunks. So that's a nice thing about Jupiter notebooks. 00:52:31.000 --> 00:52:39.000 And if you're able to open a Jupiter notebook you should be able to see stuff like this once you've cloned the repository. 00:52:39.000 --> 00:52:44.000 Okay. So let's go back to the slides. 00:52:44.000 --> 00:52:45.000 So how do you get set up with Jupiter notebooks for the for the boot camp? 00:52:45.000 --> 00:52:54.000 So follow. Step 3. Under the first steps on the website that has instructions. 00:52:54.000 --> 00:53:08.000 There's a couple of different ways. You could do it if you're very brand new to Python, I always suggest using the anaconda navigator to get started, and there's a link here in the slides that I'll upload that takes you to the installation page it's 00:53:08.000 --> 00:53:15.000 a exists on the course. Webinar that takes you to the installation page. It's also exists on the course website under the first steps. 00:53:15.000 --> 00:53:21.000 Step 3 another way you could do it is maybe you're someone who's more familiar with python and writing code and that sort of thing. 00:53:21.000 --> 00:53:22.000 And you don't want to another software install in your computer. 00:53:22.000 --> 00:53:31.000 You can also install Jupiter with any other like any other python package by going to their website. 00:53:31.000 --> 00:53:36.000 And they have instructions on how to install it, and then access the Jupiter Notebook. 00:53:36.000 --> 00:53:44.000 So let's go back to the slides. 00:53:44.000 --> 00:53:50.000 And then let me get my chats pulled up, so I see Steven has a question. 00:53:50.000 --> 00:53:54.000 I tried using both anaconda and Vsc. And find that I prefer Vsc. 00:53:54.000 --> 00:54:01.000 Will there be any issues with completing course? Content? So Nope, the only thing I care about is that you're able to open the Jupiter notebooks. 00:54:01.000 --> 00:54:06.000 It doesn't matter to me how you get there. Just be aware that like, if you have like a Vsc issue, I probably won't be able to help you honestly, if you had an ancestral issue. 00:54:06.000 --> 00:54:13.000 I also might not be able to help you cause I typically just go with the command line or the terminal. But I'm a little bit. 00:54:13.000 --> 00:54:17.000 I know a little bit more about anaconda, and I know like nothing about Vsc. 00:54:17.000 --> 00:54:28.000 Just be able to open a Jupiter notebook. I don't really care how you get there just as long as you understand it, and you're able to do. 00:54:28.000 --> 00:54:41.000 Are there any other Jupiter Notebook? Questions? 00:54:41.000 --> 00:54:44.000 Okay. So Laura is asking. The instructions recommend using home brewer with anaconda for Mac computers with an M. 00:54:44.000 --> 00:54:47.000 One ship. But there's a new version of the anaconda that claims to be compatible. 00:54:47.000 --> 00:54:53.000 Is there any problem with using the new one? Nope, just use whatever you want to use. 00:54:53.000 --> 00:55:00.000 So the old instructions were written like one to 2 years ago, and I just haven't gotten around to updating it. 00:55:00.000 --> 00:55:06.000 So if anaconda now works with the newer Macs, I also believe now that they have, like an M. 00:55:06.000 --> 00:55:11.000 2, chip. So I'm behind the Tim's just whatever works for your computer. 00:55:11.000 --> 00:55:13.000 If you're able to figure it on your own. That makes me happy because I don't have to figure out how to do it. 00:55:13.000 --> 00:55:31.000 So if you want to use, you know that version of the Endaconda go for it, you know. That's fine with me. 00:55:31.000 --> 00:55:36.000 Okay. If there aren't any other Jupiter questions, I'm gonna open up the floor. 00:55:36.000 --> 00:55:54.000 So just any questions or concerns you have about the boot camp in general, so try and stay away from specific installation questions with things like, Get or Github or Jupiter if you're like having very particular issues I can I'm gonna stick around after the recordings, done keep these questions. 00:55:54.000 --> 00:56:00.000 To just like about the boot camp at large, whether it be projects or lectures, problem sessions, that sort of thing. 00:56:00.000 --> 00:56:09.000 I'm here to answer any of those, and I'll stop sharing my screen and open up the floor. 00:56:09.000 --> 00:56:10.000 The question that comes to mind is, let's say so. I. 00:56:10.000 --> 00:56:20.000 The conference recorded. And there's also videos associated to each notebook, including the stuff that's not like done in the lecture. 00:56:20.000 --> 00:56:25.000 How I guess like, let's say, for instance, you know, as somebody who prefers live. 00:56:25.000 --> 00:56:31.000 But if I'm doing stuff on my own, would prefer reading to watching a video like just work just working through the gymy without. 00:56:31.000 --> 00:56:41.000 Necessarily, you know, watching the associated video or reading I know there's like a textbook just a large Pdf, how complete are those like is there something where it's? 00:56:41.000 --> 00:56:46.000 Like. Is there something where it's? Oh, if you almost have a new, you just missed a huge point or. 00:56:46.000 --> 00:56:47.000 So all of the notebooks they're written, so that in my opinion, you could work through them without the video. 00:56:47.000 --> 00:56:59.000 Some people are more like visual audio learners, and so I think the video works well for them. 00:56:59.000 --> 00:57:04.000 But if you're someone that works best by just reading through something, and then just read through it. 00:57:04.000 --> 00:57:11.000 And then, if you think that something's missing, you can always refer back to the video and then see if if it helps. 00:57:11.000 --> 00:57:28.000 And the other question I have is, so there's this one like kinda basically like a textbook that you guys put up somewhere on the website does that is that mostly like the content that is covered directly in life. Or does that include the other like supplemental content that you said it's a bit more optional. 00:57:28.000 --> 00:57:32.000 So I didn't write the no, the I didn't write the textbook. 00:57:32.000 --> 00:57:36.000 Pdf, that was a previous participant, whereas part of their studying for job interviews, they found it helpful to rewrite it as like a Pdf. 00:57:36.000 --> 00:57:47.000 That they could share, and then they gave it to us, and let us put it up so other people could look through it and see if it was useful. 00:57:47.000 --> 00:57:52.000 So I don't know. Like the extent to what Patrick put into it. 00:57:52.000 --> 00:57:53.000 Oh, you'd have to check that out on your own. 00:57:53.000 --> 00:57:54.000 It certainly doesn't have so like, I added, like maybe 3 notebooks this year. 00:57:54.000 --> 00:58:03.000 It won't have that content, because Patrick is like has a real job. Now. 00:58:03.000 --> 00:58:07.000 So! 00:58:07.000 --> 00:58:08.000 Yup! 00:58:08.000 --> 00:58:17.000 Alright. Thank you. 00:58:17.000 --> 00:58:23.000 So Arlene is asking. I am interested in doing a music application of data science for the project. 00:58:23.000 --> 00:58:29.000 I see it as possible to do a recommendation type of project where you say I want a happy song with certain lyrics, and it recommends a song. 00:58:29.000 --> 00:58:38.000 Do you know the scope of difficulty to do a music generation type project where you say I want a happy song that is upbeat, and it creates a custom song with your specifications. 00:58:38.000 --> 00:58:48.000 So I would say, in general, doing that from scratch is probably very difficult, you'd have to get if you're doing it from scratch like from scratch, from like. 00:58:48.000 --> 00:58:51.000 You're not using any existing tools like a chat. 00:58:51.000 --> 00:58:59.000 Gpt type thing that would be difficult just because you have to have all the audio files to train whatever recommender or generator you're doing, then you'd have to build the model. 00:58:59.000 --> 00:59:13.000 Then you would have to, you know, test the model out that being said there do probably do exist tools like a chat, Gpt, or you could do like. 00:59:13.000 --> 00:59:17.000 I'm sure there's something like a chat Gpt or a dolly that probably does. 00:59:17.000 --> 00:59:34.000 All of this, you know, does something like that already. If you were to use something like that, though that you're judging might be a little harsher, because while you did, you know, learn how to use the tool at the end of the day, the tool the person who trained the tool, maybe did all a lot of 00:59:34.000 --> 00:59:36.000 the work it just depends on like how much you're gonna use. 00:59:36.000 --> 01:00:02.000 The existing tool and then retrain it for whatever purposes you're doing and tweaking it. So I could see it being very difficult. I mean, you can also see it being very easy where you just pay to have access to an already existing tool. And then use the tool so just depends. 01:00:02.000 --> 01:00:10.000 Are there any other questions? 01:00:10.000 --> 01:00:11.000 Irrett is asking realistically if we successfully complete this boot camp with a project. 01:00:11.000 --> 01:00:19.000 Would that be sufficient to make us a competitive candidate for data, science jobs or the typical air to Shalom need to do more stuff afterwards? 01:00:19.000 --> 01:00:24.000 So! 01:00:24.000 --> 01:00:31.000 I think that if this is, I've seen people who just do the boot camp, and then are able to get a job. 01:00:31.000 --> 01:00:38.000 I've also seen people who do the boot camp multiple times or do the boot camp once and then do extra projects. 01:00:38.000 --> 01:00:40.000 Or maybe they did the boot camp, but they already had a relatively strong like data E background. 01:00:40.000 --> 01:00:48.000 From their research work or from preview positions you know, we've seen all sorts of things. 01:00:48.000 --> 01:00:53.000 I don't have a good sense from like our actual data on who gets jobs of like, you know, what's the probability that kind of thing? 01:00:53.000 --> 01:01:05.000 It just depends. I would say, that currently the job market right now is a little bit tougher because of all these tech layoffs. 01:01:05.000 --> 01:01:06.000 That's a lot of people entering the job market. 01:01:06.000 --> 01:01:10.000 So! 01:01:10.000 --> 01:01:14.000 The more projects you can do the better. That also being said, I do understand that there are people here who are entering the job market this summer, and want to get a job as quickly as possible. 01:01:14.000 --> 01:01:38.000 Totally understand that so if that's the case, I would say that it might be best if you, in addition, like, you know, taking the time to do the boot camp while you're doing it here like, use it as a time to study for interviews, and that sort of thing as well I know we have existing 01:01:38.000 --> 01:01:40.000 technical interview prep stuff that's available to participants. 01:01:40.000 --> 01:01:46.000 And you can ask about that in the Spring Cohort Channel, and someone who knows where it is will tell you where that stuff is. 01:01:46.000 --> 01:01:51.000 I think it just depends on a combination of your background. 01:01:51.000 --> 01:02:09.000 What the people at the job are looking for, and then honestly, also, like how good you're able to make your resume so that it's, you know, gets past these automatic like rejection machines that these places have now and we have a great team that can help you do that with their career coaching 01:02:09.000 --> 01:02:16.000 team so contact them if you need a resume help through your profile, and they'll be able to help you. 01:02:16.000 --> 01:02:20.000 They want to help you get a job just as well as you want to get a job. 01:02:20.000 --> 01:02:26.000 Maria asks if I have python set up my computer already, and are comfortable coding. 01:02:26.000 --> 01:02:30.000 Do you still recommend doing going to tomorrow's problem? Solving section? 01:02:30.000 --> 01:02:39.000 It's entirely up to you. So if you are already good, you feel like you're already good to go, and you just want to wait until we start doing data sciencee stuff. That's fine. 01:02:39.000 --> 01:02:49.000 By me. I could say it could be useful to go just to meet other people in the boot camp and get, you know, used to working with them, you know, whatever works for you. 01:02:49.000 --> 01:02:54.000 Kirson is asking, what is the ideal time to start working on the project? 01:02:54.000 --> 01:03:03.000 It almost feels like there will be a time crunch. So we have a system that was put in place a few years ago by Lindsay, Warrenburg, and then has been taken up by Alec. 01:03:03.000 --> 01:03:11.000 Now where we try and do our best to set up like attainable goals throughout the first 3 weeks of the boot camp, so that you're not like hitting the ground with nothing done after the lectures are done. 01:03:11.000 --> 01:03:16.000 So this week is all about finding a project and finding a team. 01:03:16.000 --> 01:03:28.000 Next week is all about like getting your project mentor, and doing some initial steps like finding a data set exploring that data set. 01:03:28.000 --> 01:03:29.000 And then I can't remember what the last week's about. 01:03:29.000 --> 01:03:36.000 Then after that, I believe you have like a week and a half to work on your projects. 01:03:36.000 --> 01:03:39.000 There will be no lectures, no problem, sessions. 01:03:39.000 --> 01:03:42.000 So it is a lot I'm not gonna lie. 01:03:42.000 --> 01:03:58.000 It is a lot of work, but I think if you I would say you don't need to worry about actually doing work work on the project this week, the biggest concern is probably finding a project that you want to work on finding a team, you want to work with then next week, I would say you want to start working a little bit. 01:03:58.000 --> 01:04:05.000 Maybe each day, or as you're able to on your project. 01:04:05.000 --> 01:04:11.000 If that means you work a little bit each day, or just work on the Friday or during the weekend it's up to you. 01:04:11.000 --> 01:04:17.000 However, you work best, you know. That's sort of the answer. 01:04:17.000 --> 01:04:20.000 Yes, so there is. I think maybe Alk forgot there are going to be project mentors that you get to work with. 01:04:20.000 --> 01:04:33.000 Alec will see what your interests are for your project, and then trying his best to give you a project mentor that has expertise in that area. 01:04:33.000 --> 01:04:36.000 Joseph asks, are our solutions to the problem? 01:04:36.000 --> 01:04:38.000 Session sets that we personally write throughout the month meant to be kept private, or can our work be posted publicly on our Github? 01:04:38.000 --> 01:04:49.000 That's a good question. 01:04:49.000 --> 01:04:54.000 So! 01:04:54.000 --> 01:04:59.000 Let's keep it private, just because I might want to reuse some of the stuff for future boot camps, and that being the case, I don't want it to be like out there in the ether. 01:04:59.000 --> 01:05:11.000 So other people see it. 01:05:11.000 --> 01:05:25.000 Any other questions? 01:05:25.000 --> 01:05:30.000 Okay, great. So as the last thing for the roomed part, just remember the 2 things that I'd like you to make sure you have done by tomorrow. 01:05:30.000 --> 01:05:37.000 Have the Github Repository cloned on your computer. 01:05:37.000 --> 01:05:47.000 Be able to get to a Jupiter notebook that will be for the problem sessions and then in the lecture we're going to start doing actual data stuff with talking about data collection like somebody asked earlier. 01:05:47.000 --> 01:05:53.000 This will give you a sense of where you can find data sets relatively easily, and also where you can you know tools that you can use to scrape data slightly less easily, but still pretty easily.