Unsupervised Learning Introduction Video Lecture Transcript
This transcript was automatically generated, so there may be discrepancies between the video and the text.

Hi, everybody. Welcome back in this video. We'll introduce the ideas behind unsupervised learning before we get started in them uh proper. So in supervised learning, uh so things like regression classification time series, we tried to train an algorithm or in other words, fit a model uh to help us predict some variable of interest. Why uh we used a set of features X typically, right?
We assumed that there were m features with N observations. Uh We called the supervised learning because it was we were working under the supervision of having a set of labels. So Y served as a set of labels either as some value to regress or some thing that we're trying to classify. Uh we worked under the supervision of that set. Hence the name supervised learning.
There are also instances where we might be interested in understanding data that does not have a set of labels uh just features. So in this setting, we are going to call it unsupervised learning because we don't have uh the, the benefit of the supervision of labels. Um The idea here typically is maybe we're trying to understand some underlying structure of the data.
The features X uh if any, so there will come times where maybe you have a data set where there's not a set of features you're trying to predict, but you're just trying to understand it in its own merits. Other times you may use unsupervised learning techniques uh in order to make a supervised learning uh better. Uh OK. So two of the main problems in unsupervised learning that will have content on our dimension reduction.
So that's taking uh a high dimension data set. So think uh data with hundreds or thousands of columns uh making it smaller uh fewer columns uh and then clustering. So this is data that you think maybe has uh a number of groups that are present within the data. So observations are grouped together somehow. Uh And we would like to uh find those groups in an algorithmic way.
So dimension reduction, for instance, as we said, high dimension data to small dimension data, maybe we do this because we want to be able to visualize this high dimensional data set in some sort of meaningful way. Sometimes if you have a lot of features, a lot of those features may actually be noise. And so this could be a nice way to remove the noise from the data to get a better model.
Uh Sometimes you need to cut down on computational time. So a lot of features may make certain algorithms run more slowly other times you're doing this uh in a similar computation time you're doing this because you need to save memory space on your computer for other algorithms clustering. Uh We're one we're trying to group observations that are similar.
Sometimes we do this um because maybe we want to, it's a con concrete example, maybe we work in marketing and we think that there are uh K groups of customers and we want to, you know, market differently to each of those K groups. OK. So that's sort of a nice introduction to what we're gonna cover in these unsupervised learning notebooks.
We, we have a folder for dimension reduction which has a couple uh notebooks on dimension reduction techniques as well as a folder on clustering. Uh I hope to see you in those videos and I hope you enjoyed getting this overview of unsupervised learning uh from the Air Edition Institute. All right. Bye. Have a great rest of your day.