Data Collection Databases Video Lecture Transcript This transcript was automatically generated, so there may be discrepancies between the video and the text. Hi and welcome back. In this video we continue on with data collection by talking about how you can get to data stored in a database. O let me go ahead and share the juyter notebook. So you'll see that I'm working in a Jupiter notebook that is data and databases complete. That's because there are going to be some code chunks in this notebook that are empty. If you start with the data and databases without the complete, I'm going to fill in those empty slots in this notebook. So this will be the complete version of the notebook. If you want to work along and code with me or try and code them on your own, you should start with the data and databases. On its own notebook and then fill in as we go along. So sometimes your data will not be stored in a single or maybe a handful of data files like we've seen in earlier videos if you've watched those, but is rather stored in a series of interconnected tables that are stored in what is known as a relational database. Here we're going to demonstrate how you can use Python to get the data out of those databases. You can use things like pandas and Numpy to manipulate the data. So in this notebook, we're going to define the concept of a relational database. We will talk about the structured query language, or SQL. We will show you how to access data in the database with SQL Alchemy, and then we'll show you how to read a database with pandas. So what is a relational database? So a relational database. Is a set of tables or data sets that are related to one another through a series of key values. So any nice example I always like to think about is, let's imagine we are at a business that sells physical goods. So not like some virtual business, but a physical goods business and maybe they want to keep track of the purchases that are made and the connecting customers who made those purchases. So in this instance we would have a table, a running log of all the purchases that were made and then along with it in the background there should be a running log of all the customers we have or have had in the past. And so along with this we may have the purchases table. The purchases would usually have a purchase ID which is unique to each purchase. So we can see here 12342 all the way down to 12348. And then along with that, if we want to connect these purchases to a customer, we have to associate with it a customer and the customer table stored in this customer ID column in the purchases table. So these customer ID's will link back to unique customers within the customer table. So we can see here that customer ID 784574 made these two purchases. The reason that these databases are nice and maybe sometimes superior to a single data file is that it's going to allow us to intelligently take complex data sets and quickly and efficiently query it in such a way that we can answer questions. Like, for instance, let's say we know the gender of the customers. Are men more likely to buy an item item a than women say? Maybe we can see this use this sort of data structure to. Get quickly data that would allow us to understand how we should target advertising dollars to individuals for product X. So maybe from here we know product X is similar to a product that is frequently purchased by this type of person based on qualities of the customers linking to purchases. Maybe we also want to use this to help us with our marketing. So how? What groups of people should really receive coupon emails for items that are similar to item Y? And so we could look up any time a purchase has item Y in it, we can look up data about the customers that have made that purchase. So you may have to interact with a database in a in a wide variety of settings, whether it's a personal project, an academic research project, or an industry project. Typically an industry you will be dealing with some kind of database. And so to do that we're going to show you how you can get data out of a database using Python. We'll learn about some other things in practice problem notebook in the practice problem notebook associated with this video and lecture notebook. So while I said we were going to learn about Python, we first have to take a detour into the structured query language or sequel. So sequel is what allows you to submit queries to a database and then get back answers. So standard sequel queries are used for creating databases and tables within a database. So for this instance, someone had to create the purchases table and the customers table within this businesses database. You can then also use these queries to enter data within an established table, or remove data from a table. Maybe you need to delete an old table because you've made a new and improved version, or even an old database. And then finally, what we're going to be interested in is retrieving data that meets the specification, so we want to know all purchases made, for instance, by a particular customer. So, as I said, we're gonna focus mostly exclusively on the retrieve data aspect in this notebook. And then for those of you interested in learning about other common SQL tasks, check out the accompanying practice problems notebook for this lecture. So in SQL the way that you can retrieve your data is to use what's known as a select statement. So select statements have a syntax in SQL as follows. So you type out the word selects and the reason these are in capitals is this is just the common syntax of SQL. It works in lower case or I believe even in mixed case. So you could go lower up or lower, but people tend to use the SQL keywords as capitals. So select is a keyword saying hey I want you to get me something. Star specifies what columns you would like returned. So if you just put a star it will give you all the columns of the table that you're interested in. But you can also specify specific columns from is what's telling sequel. All right, get me these columns from this particular table which I am just putting in a holder here. Table name. And then if you're interested in only certain things, you can say where and then give a conditional statement. So for instance, you could say give me all the purchases from, give me all the price, maybe there's a column called price from the purchases table where and then maybe there's a dates of the purchase where the date was April 7th, 1992, okay. And so we're going to.