keras Video Lecture Transcript This transcript was automatically generated, so there may be discrepancies between the video and the text. Hi, everybody. Welcome back in this video. We're gonna learn how you can build a neural network using the car package. Let me go ahead and share my Jupiter notebook and we'll get started. So uh we're gonna introduce a Python package called Kiss, which is gonna give us a little bit more versatility in building neural networks than S K learn does. In particular. We'll introduce Kiss and then we'll have a brief aside where we talk about installation of Kiss. Uh We will introduce and review its syntax for building a feed forward neural neural network. Um So let's go ahead import my packages. So Caris is a deep learning API written in Python that runs on top of the machine learning platform, tensor flow. So Caris is sort of you can think of um in our data collection stuff, we talked about Python wrappers around API S where somebody wrote Python code to make, interacting with an online API much easier. This is sort of the same thing. So Kiss is a Python package written to make building neural networks and tensor flow much more user friendly. Uh So it was developed uh with a focus on enabling fast experimentation. Being able to go from idea to results as fast as possible is key to doing good research. So in this notebook, we're gonna show you how to use, use Kiss to build a multi layer network. We're basically gonna try and redo what we did in the S K Learn Notebook. Uh But this time using Caris, so in order to use this stuff and do it, what we're gonna teach you, you need to have carrots installed. Um So let's check the installation. Uh If you have it installed, you should be able to run this code junk just fine. Uh And after it's done, you should see the version of car that you have installed. So when I wrote this notebook, I had version 2.6 point zero installed. And when I'm currently running it right now, as I record, I still have that version running on my laptop. Uh So if you're able to run this good to go, it seems like you're good to go, you should have Carri installed. We'll find out later when we try and actually do stuff. If it, if it's working the way we think it should, if not, you're gonna have to install car. So one way to install Carris is to go to the documentation, look at the getting started. Uh and then go to the installing car and try and go through their instructions there. Um Or you can try if you're familiar with using pip, you can try pip, install keras and see if that works. Uh If you're using conda, you can go to this link where they have uh installation instructions for car. Uh If you have an Apple laptop with an M one chip, uh you may have some issues. Uh uh Apple one computers with M one chips may need a little bit of extra help. So earlier in 2022 when I was setting up my new laptop that has an M one chip, uh the car package, installation instructions didn't yet play well with the hardware of the M one chip that may have been solved by now, but maybe not. So try performing a web search for relevant instructions for Apple computers with M one chips. If you're still having a hard time, send me a message uh or send anyone at the Art Institute who's been helping out with this sort of stuff, a message and we'll make sure to get you directed to the right place and the right person to get you help. Ok. So if you're moving forward with this part of the video, I am assuming that you have installed car and we're gonna find out if it runs the way that we think it should. So the first thing we're gonna do is if we're gonna build a neural network to predict M N I S T handwritten digits, well, we better import that data set. So we're gonna import the M N I S T data set and here we've done that. Remember it already does the train test split for us, which is a nice feature of Kari's version. And as a reminder, we have 60,000 observations of 28 by 28 pixels. So each, each row of this array is a 28 by 28 grid. So that's not gonna work well for our multi layer network, right, where we're assuming our inputs are a single uh row of column observations. Here, each input would be a 28 by 28 matrix. So we have to reshape the data so that it's in the form of a matrix where each row has 28 times 28 columns. So to do that, we use the reshape just real quickly. So X trainin dot reshape negative one and then instead of figuring out what 28 by 28 is I just put 28 times 28. And then I also scale the data at the same time. So remember with this pixelated data where we have a gray scale from 0 to 2 55. It's common for us to just divide by the maximum value of the gray scale which is 2 55. OK. And then we can see for instance X train uh here we have a bunch of zeros and then values that are scaled between zero and one. ... So now we're gonna learn how to build the feed forward network uh that we built in the S K Learn notebook. But in so we're gonna use all of these things, OK? And if, as you're running it, you find eventually we're gonna use something called two categorical. And if you either get an error while you're importing uh two categorical here, or if you get an error later, when you're trying to do uh use two car categorical, it's possible that the problem is that you have an earlier version of cars and the earlier version of Kiss has two categorical stored somewhere else. So if you have an earlier version, uh try un commenting out this line that's currently highlighted and see if that works for you. Uh While also remembering that you should comment out this line. Uh So if you're running an earlier version of Caris and this code chunk gives you an error, comment out this line, uncommon out this one and see if that works. If not, you may need to do a web search to solve that error. Uh Just know that two categorical from earlier versions of kiss 22.62 categorical has changed locations. ... So we're going to go through step by step, the process that you need to take to build a neural network model and car. And the first step in that process is you have to make an empty model object. So you first make an empty model and then this model is going to have the layers added to it. So in order to make an empty model, you call models which we imported above uh models not sequential, it's called sequential. Meaning that we're going to keep adding stacks of layers. So we're just gonna add layers in a sequential fashion is what that means. We're going to build the following model, which is different than the ones we built in S K learn. But it's just for illustrative purposes. And uh 16 nodes as the depth of our hidden layer means that it's going to train much more quickly than the ones in S K learn did. Um where we had what 502 100 by 200. So 16 by 16 will train relatively quickly. So our input layer is gonna have 28 by 28 nodes. And then we're gonna use rayle activations in between both the input to the first hidden layer. And then the second hidden layer to the uh the first hidden layer to the second hidden layer. Both of these will have 16 nodes. And then our activation function from the second hidden layer to the output layer uh is gonna be what's known as a soft max. We talk more about this in the associated practice problems. Softmax just allows you to take the outputs of this layer and then turn it into a a probability. It's an estimate of the probability that each observation will be one of these digits. So for instance, the entry and the zero entry of the output will give you the probability that the observation is a zero. Uh Then we have the probability that the observation is a one, the probability that an observation is a two and so forth. So we're going to go through and show, you know, I have these layers that I want. How do they add them into a car model? We're gonna give you the syntax for that. Uh The key part is once we write the code for this, we can only run it once. Why do we only want to run it once? Well, we're gonna do something called model dot add where we add in the layer that we want. And once the model has added the layer once, if we were to go back and rerun this code junk, after we've written all the code, it's going to keep adding more layers. So after we run this, we'll have these two layers. If we were to rerun this code junk, we would then have four layers. So that's the reason why we only want to run uh the A step once. OK. So the ana add step is gonna allow us to add in these hidden layers. And so in Caris nomenclature, the layer that we're interested in is a dense layer. So feed forward networks are called dense layers, they're dense because every node is connected to every other node in the next layer of the network. So we call layers which we imported above, we call layers dot dense. So you put in the layer that you want first, then you put in your activation function ... which is ray, then you need to put in uh how many nodes you want. So 16 and then I forget. So what we're gonna do and I know I said not to run this once, but if there's an error, you can ruin it. So I'm gonna try um I forget if your input shape. So in the first layer, the first hidden layer that you put, you have to tell it what it should expect as the size of the array that you're putting in. And by size of the array, it means what is the dimension of the features? And so for us, the dimension is going to be 28 times 28 because that's the number of features we have. Uh So I'm gonna go ahead and check to see if this works and if it doesn't, it means that input shape needs to be inside of dense. ... OK. Yep. So input shape needs to be an argument into dense. ... There we go. And I just realized I accidentally ran that when I wasn't supposed to. So let me go ahead and clear my kernel and we will get back to this spot. ... So I did all that time telling you guys not to do it. And then I went ahead and did it anyway. So ... OK. Now, before I hit, before I run it again, I'm gonna go ahead and just copy and paste. The layer is gonna be exactly the same. The only difference between the first layer and the second layer is that once they have the first hidden layer kiss is able to infer what the input shape should be for the second layer and subsequent layers uh based upon the output of the previous layer. So this out this layer will have an output of 16 of size 16. Uh And then the second layer, once I add it will know that that's the output size because of the previous layer. Now the last thing we have to add. So we have hidden layer one, we have hidden layer two. The last thing we have to add is hidden layer three or not hidden layer but the output layer. So this is also a dense layer. It's going to be a size 10 because we have 10 different digits. And then I want to have an activation function, not of ray, but of soft max. And again, we talk more about what the softmax function actually is and the associated practice problems notebook for uh this notebook. OK. Now we've built our model uh and we could look at here is a nice thing. Well, we haven't finished building it but we have put the architecture in place. And once you've done something like this, you can call model dot summary to see what it looks like. And so you can see here that we have a dense layer, another dense layer. So these are our first two hidden layers and we can tell because of the shape 16. And then finally, we have um our output layer which has size to output size 10. And we can also notice that it keeps track of the number of parameters, sort of like a running sum. So this first hidden layer has a large number of parameters, right? Because we have a 28 by 28 input layer go uh going into a 16 node hidden layer. Uh And so I don't wanna try and do the math off the top of my head, but that's a, that's a lot of weights that need to be solved for. So you can see how these things start to really maybe over fit on the training data even with a relatively small network. OK. So now we're going to go ahead and compile the model. So when you compile the model, you're telling it what gradient descent algorithm you want it to use, what loss function you want to optimize and what metrics you'd like to to keep track of during the training step. So the optimizer we're going to use is called R MS prop. You can find out more about this and the car documentation. Uh we're going to use categorical cross entropy. Uh So this is the cost function that we're going to be using. You can think of this. Uh We went over this in the classification metrics. It was just called cross entropy or entropy then, and then we wanted to keep track of the accuracy as we train it. So this will keep track of both the training accuracy and the validation set accuracy as we'll see in a little bit. And so to do this, you just call model uh dot compile. And then you put in the optimizer which we put in, you put in the loss, which we also put in already and you put in the metrics which we put in. Um And then again, this is one that you only want to run once. Now it comes time to fit the data on the training set. So to do this, we're going to make both a training set, but we also want a validation set. So we're gonna do a validation split from the training data uh of size 20% of the data set. Uh So that's what we do here. And again, remember for classification, I wanna stratify. ... And now uh before we get to the training steps, we made our train test split, we're gonna demonstrate what um two categorical does. So remember I had you import two categorical and I said, well, we'll talk about this later. So as a quick point where this is what, why? And it should be why train train, this is what why train train looks like a as is it's a list with whatever digit it is. So 072485 are the last three. Uh Now, Caris needs to be cannot take this as an input. It will not know what to do with this input. Instead it needs uh an array that is essentially going to be a vector of one hot encoded value. So for instance, the vector corresponding to the first entry will be have a one in the zero place and zeros everywhere else, the corresponding to the second one will have a one in the seventh place and zeros everywhere else. And so two categorical does this for us. So again, I have two categorical, I'm gonna change this to why train train. And as you can see, I now have a list of lists where uh this is going to be seen as an array or I believe in terra it, it's called a tensor, a tensor. Um Here I have a one in the zeros place zeros everywhere else um and so forth. OK. So now I can fit the model. I'm going to use 100 epics or Epochs which I'm gonna set here. I'm gonna use a batch size of 5 12, which I'm gonna store in a variable here. And now we're going to do uh the model fit. So for model fit, you call X train train that's the features, then you call two categorical ... uh of the output. ... Then we're going to put the number of Epics, so epochs epics, whatever, then we put our batch size. So this is gonna do that mini batch gradient descent we talked about ... and then finally, we're gonna put down that we have a validation set. Um Let's go to the car documentation real quick and then check because I forget what the, ... so what I'm doing right now is just looking for the dot Fit method because I forget. OK. So do do do validation split. OK. So I want validation data. Uh This is just what I need to call ... uh the validation. So I believe you do X vow comma two categorical. Why though? ... Do do do OK. What went wrong? ... All right. I'm gonna pause my record and figure out what I did wrong. I think it probably has to do with this validation data. So I'm gonna figure out for sure and then I will get right back to you. OK. So I went to the documentation and I read further and it looks like I needed to be in a two pole, not a list. So let's go through and check to see if that fixes it. And if not, maybe I'll pause the video again and see if that works. OK? Let me, I'm gonna pause the video again and figure out what's going on. OK? So after searching for a little bit, I think that maybe there was no actual error but like something with trying to train it the first time when there was an error messed up with trying to train it uh fit the model the second time when there was no error because I restarted my notebook and reran everything from start uh uh from the start. Uh and this ran just fine. And so now let's try the ultimate test. This should be if you are coding along or you're just watching the final code. Uh And you can compare to when I stopped recording for a second and then came back, this should be exactly the same as it was before. And now we're gonna run it and show how it trains. So let's let's go. Ok. So now here this might be worried that this is uh an error but this is actually just a warning about something uh going on. Uh But you can see that it's training and so how do we know it's training because we see updates like epoch three of 100. Uh It, this has like a progress bar that shows like its progress through the training set. Uh It tells you how long it takes for a step gives you the loss of the training set, the accuracy on the training set, the loss on the validation set and the validation accuracy. And it allows you to keep track as it goes through for this example, all 100 epochs. Uh So now all of this data that got printed out here is also stored because we stored this model fit in a variable called history. So when we call history itself also has a history uh attribute, which is a dictionary where we can access things like loss or which is the loss on the training set. Uh We could look at the accuracy which is the accuracy on the training set over all the epics. Uh We can get validation, set loss and we can get validation set accuracy over all the epics. And so what I'm gonna do is I'm gonna store this dictionary in a variable called history dict, which is just a, a variable that will allow me to access this later and then we can see now we have these keys where we can access the loss, the accuracy on the training set validation set loss and the validation accuracy, which can allow me to look at the performance over time. And so in these images, uh each dot is going to represent the accuracy on the training set. And maybe what I'll do is I will change the marker. So then maybe it's easier to see. So now the marker for the validation set while in addition to being orange should also be a triangle. ... OK. So what you're looking for essentially is we wanna figure out similar to what we did with X G boost. If you watch that video, uh we want to figure out at what point are we starting to overtrain on the training data. And we can tell that when we start to see the training accuracy continue to rise while the validation set, accuracy or loss stays about the same or starts to uh decrease for accuracy or increase for the loss. And so for us, it looks like this starts to happen around 20 epics or epochs. Uh And if we go to the last, we see something similar, so if we were to go back through, we may stop training, we may retrain from the beginning and then stop training around 20 because this is where the performance on the training set starts to outpace the performance on a validation set uh which indicates that we are starting to over fit on the training data. OK. So another step that you might take in a standard neural network problem is you might wanna try messing around with different model architectures. So here we're gonna build basically the exact same model here. But all we do is increase the size of these hidden layers from 16 nodes to 32 nodes. So the code is the same except the only thing that changes is these are now 32 S instead of sixteens. So model two is its own model object. Uh It's gonna be compiled exactly the same way, fit exactly the same way. OK. And then I store the results in uh a dictionary. And so here we can see it training and if we, it trains very quickly. But as you can see if, as we're trying to keep up with it, you can see the progress bar increasing uh as it starts to train. Now, this increase, this little progress bar is much slower. On other models we'll see in the next couple of notebooks uh for this section. Um But on this feed forward problem, it's pretty quick. OK? And now when you're looking at model architectures and trying to choose which one is best a standard approach might be that you plot the validation set accuracy for the two neural networks. And again, let's go through and change the markers here. Just so it's a little bit easier. Not everybody has the easiest time seeing the difference between the two colors. And I meant to do this ahead of time, but I forgot. So now the uh validation accuracy is an orange triangle for the second neural network, the 32 by 32. And the loss is an orange triangle for the 32 by 32 network as well. And so we can see here that it does appear that the second neural network, which is the 32 by 32 achieves a greater validation set accuracy in about the same time as like the same timing for the neural network one which was the 16 by 16. Um But it appears that the neural network two is a slightly better model for this particular problem. So if we were only choosing between these two, uh we would select the 32 by 32 because it has a slightly better accuracy and training time is virtually the same. Uh So we will go through and then we can look at this dictionary and this shouldn't be 502. This should be uh an epoch. Uh We can go through this dictionary and then figure out which one had the lowest or the best validation loss, the lowest validation loss for the second neural network model. And then we can go with this and say, OK, well, we're gonna train retrain our network. Um So that this is uh the number of Epics we train for. OK. So we chose the 32 by 32. Now, we just need to choose the best stopping time for our training. And so we go through uh I've retrained it, I've remade the model from scratch. Um And actually, what I should probably do is uh between these two, I should have deleted the model. So sometimes weird things can happen with Caris if you don't delete the model prior to trying to make a new one, like the variable name here, I can kind of mess with that. So I think it's always good practice to delete the model from your system and then make the new one if you want to use the same variable name, I don't think we'll get an issue here, but you could ... OK. So now we have a model that's been trained uh And we can then go and get the uh prediction on the validation sets. And you can see here, we have an array of probabilities. So for example, um let's look at uh let's see what the shape of this is. So 1200 by 10. So what we want is then zero. So here are the probabilities um for the first observation and so it looks like the largest one is here would be 99% likely that it's a nine. So let's see how that was. And so we got a seven. So uh not so great of a guess. Oh, that's on the test set. What's the validation set? OK. On the validation set, it's a nine. Great. So uh there were a couple of typos in this version of the notebook, your version will not have those typos. Uh And so then we can go ahead and you might ask, well, how can we easily get predictions? So it's standard to take the one with the largest probability prediction as the class. And so, for instance, uh again, changing this to the validation set, uh We could go through and get the maximum uh the one that is the largest probability for all values of the validation set, all the observations in the validation set. And so, for instance, we see here now we have predictions of actual digits and then we can use this to test the accuracy on the validation set. So again, this says test, but I'm gonna make it say validation and so on the validation set. When we went back and remade our model, we had a 98.46. Now, if you went through and deleted model two, like I suggested and then reran this, you may get a different one which we could do now ... and rerun all of this stuff and then see if we still have the same 98% performance as um yep. So we still have that same 98.4% performance, which is great. OK. So that's how you build a neural network model in kiss uh feed forward neural network, we saw how to do that and we implemented it on the M N I S T data set after we did a little bit of debugging there in the middle. Uh that's standard. Sometimes you make mistakes and rather than have you guys sit here and watch me read the documentation and check out some old code to figure out what I was doing wrong. Uh I just paused the video so you didn't have to watch that. Uh But that happens in the real world all the time. OK. So I hope you enjoyed watching this video. I think building neural networks and cars is pretty fun and can be pretty cool. So hopefully you think that too. Uh I enjoyed having you come watch this video and I can't wait to see you next. Uh Can't wait until next time to see you. Have a great rest of your day. Bye.