Introduction to Recurrent Neural Networks Video Lecture Transcript
This transcript was automatically generated, so there may be discrepancies between the video and the text.

12:51:21 Welcome back in this video, we're going to continue to learn about neurural networks by introducing recurrent neural network.
12:51:29 So a network architecture that can accomplish different tasks.
12:51:33 So we will introduce recurrent neural networks.
12:51:38 We'll talk about the types of problems they're designed for.
12:51:39 Given overview of the basic architecture. There's a lot of different architectures.
12:51:44 So we'll just kind of give you an idea of the most basic.
12:51:46 We'll demonstrate sort of the setup for these architectures and then build a recurrent neural network, a very basic one to predict Imdb sentiment.
12:51:57 So we're current. Neural networks are set up for sequential data.
12:52:02 So just like convolutional neural networks are set up for grid based data and feed-forward networks are sort of for Fl data.
12:52:09 There are other types of data that are sequential. For instance, time series, or sequences of text, like natural language or music or sound files that are sequential data where you're sort of flat format is ignoring some of the dependency in the structure of the data so for instance, the second word in the
12:52:30 sentence is probably very much related to whatever the first word of the sentence was.
12:52:34 And the third word was related to the first 2 words, and so forth.
12:52:38 Well, if we were to take that and flatten it so that each word is independent of each other.
12:52:43 Word used in the sentence that's not going to be as informative, probably as keeping it in its sequence.
12:52:50 And so that's the idea of why you have a different architecture for a this sort of data set.
12:52:56 So there are a wide variety of architectures depending upon the specific problem you're trying to solve when it comes to recurrent recurrent neural networks.
12:53:06 So to demonstrate the general idea, we have this really simple architecture where we have some inputs down here X, one, x 2 x 3 x 4.
12:53:16 So this is one of the sequence, 2. In the sequence.
12:53:19 3 in the sequence, 4. In the sequence and then corresponding alpha, and then corresponding outputs.
12:53:24 And then between each input, and each output is a hidden layer, and then the hidden layers, and it also feed into each other.
12:53:30 So this might look confusing the way we want to think about this is for each step of the sequence.
12:53:35 There's an observation. X. I. Yi. Then there's a hidden layer between X.
12:53:41 I and Yi. But in addition to taking the inputs from the X to the y's, the hidden layers can also take inputs from the previous hidden layer.
12:53:50 And so what this kind of looks like is each layer of your recurrent neural network is itself a feed forward network.
12:53:57 So for instance, this first one is a feed-forward network that has the nodes being fed into it from x one, and then the hidden layer nodes for a h one, and then y one, and additional layers, though you also have this extra input so for instance, for H 2 there are inputs from a h.
12:54:15 one for h, 3. Their inputs from H, 2, and so forth.
12:54:18 And it can't get more complicated than that. This is the one of the most simple, recurrent neural network architectures that you'll see.
12:54:26 And so, how does this work schematically like what do the formulas look like?
12:54:31 So for the weighted sums that you're setting up for recurrent neural network.
12:54:37 They look something like this. So h ones nodes are given by the activation function applied to a weighted sum of the inputs.
12:54:44 And then for each subsequent hidden layer, you have an activation function applied to a weighted sum of the input plus a weighted sum of the previous hidden layer.
12:54:54 And here W. X. H. And Wh. H. Have the same entries, regardless of the hidden layer being considered so, the weights and these matrices are the same, regardless of if hidden layer, 2 hidden layer, 3 and so forth.
12:55:10 And then the outputs are given by a W-hy matrix times.
12:55:13 The hidden layer nodes with a different activation function.
12:55:16 Possibly being applied, so here we're gonna talk about Imdb.
12:55:22 Sentiment analysis by, you know, we're gonna implement this recurrent neural network setup with a an Imdb sentiment analysis problem.
12:55:30 So you're gonna run this. And if it's your first time running it may take a while. It's not my first time.
12:55:35 It's still took a little bit. And so each data set entry of the data set here is going to have a movie review from Imdb as well as a score of I believe negative one or one or 0 and one as to the sentiment of that score so what you have to do to load
12:55:50 this is you first have to set a maximum number of features, so the features here are the number of words and the vocabulary.
12:55:56 So, for instance, if you chose 10, the 10, most frequently used words across all reviews would be your features.
12:56:02 Here. I'm choosing a 10,000.
12:56:06 So the 10,000 most frequently used words here are my features.
12:56:10 Then, when you call imdb.org data, you put in the argument, Num words is just equal to Max.
12:56:17 Features. Here's what it looks like. So it's just a list of indices in each index here represents a word.
12:56:24 And then this was a possive review. You can tell that because the wide train observation is one.
12:56:31 So if you wanted to see what this looks like, you can runtrain observation is one. So if you wanted to see what this looks like, you can run this code, and it takes a dictionary, which is the word index, for this code and it takes a dictionary, which is the word index and the word index has both the word and then the
12:56:42 index has both the word and then the index for that word.
12:56:45 So I'm reversing that, so I can put in an index and get the word out.
12:56:47 And then I'm showing you what the review is, where you see question marks are what? And then I'm showing you what the review is, where you see question marks are words that we're not in the top 10,000 most used words of the data. Set so that's where you see question marks in this review.
12:57:00 So our network architecture will be slightly different than the one we showed before.
12:57:05 So, instead of having an output at each layer, you're just going to have an output at the final layer which denotes the one or the 0, maybe negative one.
12:57:15 If it's a negative review so it's slightly different, because not every input is going to have an output.
12:57:21 So the inputs here are going to be words in the sequence.
12:57:24 So this will be a vector representing the word, and the first part of the sequence of the word, and the second part of the sequence.
12:57:30 The word, and the third part of the sequence. And so these vectors will be series of zeros and ones will be series of zeros and ones, one if it's lining up with the way the word is and 0.
12:57:40 Otherwise there's also going to be a slight difference that we're not going to talk too much in depth on.
12:57:47 But I will talk about it when we get there. So the way we can do this is we can't.
12:57:52 The reviews are not all necessarily of the same blank, so we have to pre-process our pre-process, our data, so that it's going to cut off the reviews so that they are the exact same length.
12:58:03 If they are too long it will cut off, I believe I can't remember if it's the end or the beginning.
12:58:08 But the code does it for us. It will cut off the characters nexus necessary, either by cutting off from the end or from the beginning, and if it's too short of a review, it will add zeros to the end.
12:58:19 So empty words, no words there. So, for instance, if our cutoff was 10 words and our review was 5, it would add 5.
12:58:28 No words. At the end to pat out the sequence, the reason being with our architecture, all of our words have to be the same length.
12:58:36 So to do this, you'll use pad under source sequences.
12:58:41 I want to point out that pad sequences in this version of cars that I'm using is in utils.
12:58:49 But if you're using an older version, you may need to check.
12:58:53 Kara's dot pre-processing dot sequences.
12:58:59 So if you are using older cares. Okay? So this is the shape of our data before we created into these sequences.
12:59:12 And then. Now we're going to see what it looks like.
12:59:14 So we're going to do pad sequences.
12:59:19 X train, and then the maximum length is Max length.
12:59:24 So I'm making a sequence of maximum length 100 and again pad sequences, this time for X test.
12:59:34 Max links, and then we can see what this looks like.
12:59:37 We have 25,000 rows of a 100 for each of the 2 data centers.
12:59:42 And so this is what it looks like. And so here we can see that we've added padding at the beginning.
12:59:49 Somehow I don't know. This is maybe a remnant from an older version of the notebook.
12:59:54 I guess if we did, for instance, a 1,000 you could see.
13:00:00 Now you can see the padding at the beginning. Okay, but we're gonna go back to a 100.
13:00:06 Okay. Now, I'm just making my validation set, and we're ready to build our network.
13:00:11 So we're gonna import the stuff. Now we're gonna do 2 things the first thing is we're going to put in an embedding layer so we can think of an embedding layer as pre-processing that typically gets done for natural language processing tasks.
13:00:28 We're not going to dive into this because this is not a natural language processing boot camp, but the embedding layer is just basically think of it as doing the same as Pca on the words.
13:00:40 So it will be slightly different output than the input. So it's not going to be those sequence of zeros and ones.
13:00:46 It's going to give you some sort of representation of the data in a lower dimensional data space.
13:00:52 Then the simple, recurrent, neural network, those that's gonna be this structure that we've provided here.
13:00:59 Okay. So the first thing we need to do like with our other 2 architectures is, we make a sequential model.
13:01:06 Then the first thing we're going to add is this embedding layer? So we do.
13:01:08 Layers, dot embedding. Then we're going to put in the desired size.
13:01:13 So I'm going to make it 32, for no particular reason, and then are going to oh, I'm doing it backwards.
13:01:23 So Max, feature should come first. So this is just telling it the size of our dictionary.
13:01:27 So size of the dictionary. Was matched features. And then we're going to project that down into a 32 dimensional space the next thing we need to add is the simple recurrent neural network layer.
13:01:40 So layers dot simple.
13:01:42 Rnn. So the first input should be the same. Give you the size of the input.
13:01:48 Layer, so that for us is 32.
13:01:50 We're gonna do return sequences. And so we're return sequences would be we would want this to be true if we're going to have a second hidden layer.
13:02:02 See, could have x, one go to h, 1 one go to h, 1, 2 x h.
13:02:09 2, one h, 2, and so forth, so you can have a hidden layer on top of a hidden layer.
13:02:13 We're just doing a single hidden layer. So we're going to do return sequences, equals false.
13:02:21 Hey? So now, here's our. Here's what our model looks like, and then we just add the final dense layer the sigmoid with the sigmoid application.
13:02:32 Sigmoid activeation. So this is going to allow for the binary classification.
13:02:40 So this is what our model looks like. Now we're going to compile the model.
13:02:44 We've seen this before and then we're going to fit it for 10 epics.
13:02:49 Okay. So all the this is running. We'll just wait for this to run.
13:03:12 Okay. So the model is done. Training now, so we can look at the history that we get from that.
13:03:19 So this is the training invalidation, set accuracy.
13:03:24 So it looks like around the We level offer on the fourth epic. Here.
13:03:29 And then we can kind of see the same here with with the loss function.
13:03:34 So if you wanted to learn more about recurrent neural networks, you'll just have to dive more, and you can learn more about the theory.
13:03:41 With this neural networks and deep learning, and you can learn more about applying it and python, with this deep learning, with python.
13:03:48 This is how you can implement them in Kara's. So their current neural networks in that book start in Chapter 6, and in the theory book start in Chapter 7.
13:03:57 So just as a note most times, if you are going to build our current neural network from scratch with keras, you aren't going to use simple Rnn, because it's not the best model architecture.
13:04:08 But it's good enough to know the basics. And now you have a nice introduction.
13:04:11 So you can build upon that by checking out these texts that I've linked to.
13:04:15 Okay. I hope you enjoyed learning about recurrent neural networks.