The Confusion Matrix Video Lecture Transcript
This transcript was automatically generated, so there may be discrepancies between the video and the text.

Hi, everybody. Welcome back in this video. We're gonna learn some more performance metrics for classification problems using the confusion matrix. Let me go ahead and share my Jupiter notebook and then we'll get started. So in our K nearest neighbors notebook, we introduced the idea of accuracy uh but accuracy is not always the best metric.
So in this video, we're gonna end the lecture notebook, we're gonna try and introduce some additional uh accuracy or additional performance metrics for classification problems. In particular. In this notebook, we'll talk about binary classification. We have a later notebook where we talk about uh multi class classification. So we will mention some deficiencies with accuracy as a metric.
That will be the first thing we do. We'll introduce the confusion matrix. We'll divide uh derive some new performance metrics from that matrix. Uh And these are the specific um metrics we'll define. And then at the end I will show you the PDF in the repository that has a nice cheat sheet uh to help you keep track of all of these. So remember in key near neighbors, we defined accuracy as the total number of correct predictions.
Uh So things that were zero that we predicted were zero and things that were actually one that we predicted to be one uh and then divided by the total number of predictions made. So for instance, if we made 100 predictions and we got 77 of them correct, our accuracy would be 770.77. some people might say 77% accuracy. Now this is a, an an OK default metric uh with when we're not giving a lot of thought to the problem, but it can be misleading as well.
Uh Because it obfuscates which kinds of observations are being labeled incorrectly. So as a, as an example, let's imagine we have a situation where 10% of our observations in general are of class one and 90% are of class zero. So then if I were to come up with a model, that's a little bit silly and just says, well, I'm gonna predict everything uh that I see.
I'm just gonna predict it to be class zero and automatically that metric that uh model will get an accuracy of 90%. Now, if we were just showing it to the layperson, uh they might see 90% and say, hey, that's pretty good nine times out of 10, you're correct. But the problem with this metric is it doesn't tell us how we're being incorrect when we are incorrect.
So, uh we have 90% accuracy, but we don't label any of the one classes correctly. So this can be a, a big problem. For instance, maybe we are trying to use a classifier to help us predict whether or not somebody has uh a curable or treatable disease. And so if we're not getting any of those correct, we're just telling everybody, we see that they don't have the disease.
Well, then that's a big problem. Uh especially if that disease is spread, so somebody can spread it to another person. ... So this is where sort of the idea of the confusion matrix comes in. It allows us to provide context to all of our predictions based upon what they were originally what their actual label is. So for a binary problem, this is what the confusion matrix looks like.
The rows of the confusion matrix represent the actual class and the columns of the confusion matrix represent the predicted class. And typically, so this is for binary, but we can't extend it to multi class problems. And so we, you know, put it in some kind of increasing order. So class zero goes first, class one goes second. So each box here then represents where each observation falls.
So for instance, if you had an observation that was a zero and you correctly predicted to be a zero, it would fall in this box under the T N which stands for true negative. So these are the things that were negative that you correctly predicted to be negative. Uh We think of it like this because it's a binary classification problem uh for, and then in the next box, we have F P for false positive.
So these are the observations that are actually a zero that you label a one. So that's, you're saying it's a positive, but your model says it's a positive but that's incorrect. It's false, false positive. Similarly, we have false negative, which is, and it's an actually a one but your model says falsely that it's a zero. And then finally, we have true positive, which is, it's actually a one.
But your model also says it's correctly a one. So that's a true positive. If you're familiar with uh frequentist statistics, doing things like hypothesis tests. Uh If you have a false positive, that's sort of a thought of as a type two error. So sometimes we'll see something called type two error rate if you read through the literature. Uh and then false negatives are seen as a, a type, sorry, I flipped that false negatives are seen as a type two error.
So false negative, type two and then false positives are type one. So a confusion matrix also, sometimes if you're coming from maybe public health, you'd see a confusion matrix called a contingency table. Or if you've worked a lot in Excel, I believe they have contingency tables as well. Uh And as I said, we can't extend this to a multi class problem.
All we have to do is add additional rows and columns for each of the classes. But we'd will lose sort of this true negative, false positive, false negative, true positive labeling scheme uh when we get to multiple classes. ... So let's talk about a few metrics derived from the confusion matrix. Um There's lots of them, we're gonna touch on a few of them and then I provide a link at the end to a, a Wikipedia article that has a nice comprehensive list.
So two of the most popular are called precision and recall. So precision takes your true positives and then divides by the total number of predicted positives. So true positives post false positives. And so one way to think about this is out of all points, well, not to think about. But what it actually is saying is out of all the points that I predicted to be class one, what fraction of them are actually class one.
And one way to maybe think about this is an interpretation is when your algorithm tells you, hey, this is a class one. Uh how much should you actually trust it? So uh this is sort of like a conditional probability. So given that my classifier says this class is class one. What's the probability that it's actually class one? ... Uh recall is another popular one.
So this is true positives and instead of the predicted positive it's divided by the actual positives. OK. So out of all the actual data points in class one, what fraction did the algorithm correctly predict? Uh And so this is an estimate of the probability that the algorithm correctly detects a class one data point. So this is the conditional probability that we predicted to be one given that it is actually one.
So we're just gonna go ahead and build that Caners neighbors. We're gonna build a cannier neighbors model on the iris data set using three neighbors and we'll demonstrate how to calculate precision and recall. ... So this was just importing the data, making a train test split. Now I'm gonna make my kneer neighbors classifier, fit the classifier and then make a prediction on the training set.
So here's my prediction on the training set. OK? And now you might be wondering what uh wasn't the Iris data set a multi class? So we turned our Iris data set, which can be three possible types of Iris. We've turned it into a binary classification problem by saying I'm only interested when my Iris is a Virginica. So in the data set, that means the target or the, the Y is equal to two.
So any what we're taking is this multi class problem. We're just turning it into a binary classification problem where we want to know is this Iris, a Virginica Iris or not. And this, these are our predictions using that data uh with this K nearest neighbors, number of neighbors equals to five, I believe earlier, I said three, I should have said five.
... So S K Learn provides a nice and easy way to get the confusion matrix for any classifier. It's this confusion matrix metric. So we do from S K learn dot metrics, we would import the confusion matrix and then just like the mean squared error from our regression notebooks, we would first put the actual values. So the true values followed by our predicted values.
So we would call confusion matrix and then our actual values are why train and then our predicted values are why train predict. So we're saying here that we've got 77 true negatives, three um false positives, two false negatives and 38 true positives. And here I'm just storing this all. So 00 is the true negatives 01 is the false positives, 10 is false negatives and then true positives are at 11.
So then I'm gonna go in here and I can calculate the recall. So it's T P divided by T P plus F N. Let's just check that I did that right. Yep. And then I can also put in the formula for the precision which is T P divided by T P plus F P. So we've got a recall of 95% or 950.95 and a precision of 0.9268 uh one alternative here. Instead of calculating it by hand, using the confusion matrix, we can also use scalars precision score metric or their recall score metric.
And so we can go here and we'll do first recall score. And again, it's gonna be your actual values and then followed by your predicted values. ... And then the same thing you call precision score and then your actual values followed by your predicted values. ... Oh, I forgot to import them. So from S K learn dot metrics, import precision score recall score.
So one thing that's nice about importing if you didn't know this before, uh if you have two things that you're importing from the same package or sub package, you can import them together with a comma separating them. So here both precision score and recall score are stored in S QR dot Metrics. So I can import both of them with just a comma separating them.
OK? And we can see while doing it by hand is the same as just using these functions. ... So other types of metrics are sort of these. Uh If you looked here, recall was true positives divided by actual positives. So this is known as the true positive rate. So for each of the four possible outcomes and a classification, uh there's a rate for each of them.
So there's a true negative rate, a false positive rate, a false negative rate and a true positive rate. And so these are the sort of the extensions to the other classes of recall if that makes sense. So for instance, let's say that given that we uh given that a value is a true positive, what was the probability that we correctly predict it as a positive?
This is the true positive rate. So, or recall, uh given that an observation is a true positive, what is the probability that we incorrectly predict it as a negative? So that's the false negative rate which is F N divided by your pro uh your actual positives T P plus F N. Similarly, similarly, we have uh for the negatives. If an observation is a true negative, what was the probability that we correctly predicted as a negative?
That's true negative rate, which is true negatives divided by all the actual negatives. So true negatives plus false positives. And then finally, the false positive rate which is false positives divided by actual negatives. So these, there's not nice functions for these except for the true positive rate uh in S K learn. So these, we would have to calculate by hand, which I've done here ... for us.
OK. So these are our rates. And so ideally what we want are classifiers that have high, true positive or high true negatives and then low false negatives and low false positives. And as the last two that we'll touch on these are common quite a bit in public health settings. Um So maybe you've heard of these when doing sort of COVID diagnostic tests, reading about those.
So there's sensitivity, which is the probability that uh your classifier correctly identifies a positive observation. So again, true positive rate, which is the same as recall. Uh So because different fields are all developing these sorts of things. Uh At different times, we have a bunch of different names for the same thing. Uh And then there's also specificity which is the probability that a classifier correctly identifies a negative observation.
So given that this is negative, what's the probability that we classify it as a negative? So this is the same as your true negative rate. And here are the formulas for this. So sensitivity is true positive rate or recall specificity is the true negative rate. So these aren't new metrics, they're just different names, but they're used commonly enough that you're, you're gonna want to know these ones if you can remember them.
Um But as you can see, these are, there's a lot to remember here, there's a lot of different metrics that have very similar things. So it's easy to me uh slip up. So for one thing, there's a nice cheat sheet here where we've got our confusion matrix. This is a PDF that you can find in the repository. So confusion matrix cheat sheet. So here's our confusion matrix and then we have this nice table of the different metrics.
So accuracy was the one we talked about. There's basically the opposite of accuracy, which is called total error rates. Uh We've got our true positive rates, our true negative rate and all the other names for it. Uh false positive rate, false negative rate and then precision. OK. So hopefully that's helpful. There's also some more metrics which we didn't talk about.
Um, it's sort of a similar vein. Uh you can find these at this Wikipedia entry at the bottom, they go over uh like they have all these things. So for instance, here's the false positive rate, here's the true negative rate and there's a lot of other ones in here that you can see the formula and then go read about the Wikipedia entry for it. ... OK. So we have all these metrics which do we use?
Well, it's not an easy question to answer. It depends uh for a lot of the answers in this uh boot camp. A lot of what you do depends upon the real world setting that you're working in. What project are you working on. So for instance, sometimes you're gonna be more concerned about making sure that if you're saying that something is a one, then you better be right that it's a one or maybe you're more concerned in general of, well, I wanna make sure I capture all of the ones even if that
means I get some false positives in there. So public health often focuses on sensitivity and specificity because they can be really easily translated into what the real world health impacts are. So in the case of a deadly disease, maybe we have successful regimens to treat this disease, maybe we want to go for tests that have high sensitivity.
So remember, sensitivity is true positive rate. So this measures that basically measures our probability of correctly identifying a one. Uh however, that might come at the cost of some false positives. So, you know, if it's a deadly disease that has nice successful regimens, maybe we're willing to take some higher false positives as long as we're capturing all of the actual cases.
And then a lot of times what will happen is you'll have a nice easy test. Uh that isn't as invasive for the patient to test for this. And then if someone does test positive in this easier test, you go through with maybe a slightly more invasive test to double confirm that. Yes, you have this disease. Uh Another example out of public health is maybe your will, it maybe the specific disease or condition in question doesn't tend to cause severe outcomes.
Uh And the treatment or test for it is highly invasive or expensive. Uh In this setting, maybe you want to make sure to specify uh specificity. Um So that way if somebody doesn't have it, you're less likely to say that they do have it. Ok. So careful consideration of metrics is important in any setting uh that you're working in some, well, maybe not any setting.
There will be some settings where you can probably just go with accuracy and not have to worry about it because it's not a high consequence problem. Uh But a lot of times in the real world, there is a lot of thought put into which metrics you use. Uh And like, what does that mean downstream for the different stakeholders in the problem? Um OK, so we're gonna talk about these more and problem sessions as well as practice problems and you'll probably think about them more as you work in
problems and data science in your real life uh or for various projects. I hope you enjoyed learning about the confusion matrix and all of the uh related uh metrics that go along with it. I enjoyed having you here to watch this video and I hope you have a great rest of your day. Bye.