SQL for Beginners and Analysts — Get Started with SQL using Python.
Overview
- SQL is a mandatory language every analyst and data Science professional should know.
- Learn about the basics of SQL here, including how to work with SQLite database using python.
SQLite — The Lightweight and Quick Response Database.
SQL is a language every analyst and data scientist should know. There’s no escaping from this. You will be peppered with SQL questions in your analytics or data science interview rounds, especially if you’re a fresher in this field.
If you’ve been putting off learning SQL recently, it’s time to get into action and start getting your hands dirty. You would have to learn about databases to work with data so why not start your SQL journey today?
I’ve personally been working with SQL for a while and can attest to how useful it is, especially in these golden data-driven times. SQL is a simple yet powerful language that helps us manage and query data directly from a database, without having to copy it first.
It is also very easy to understand because of the various clauses that are similar to those used in the English language. So writing SQL commands will be a piece of cake for you!
And given the proliferation of data all over the world, every business is looking for professionals who are proficient in SQL. So once you add SQL skill to your resume, you will be a hotshot commodity out in the market. Great, but where to begin?
There are many different database systems out there, but the simplest and easiest to work with is SQLite. It is fast, compact, and stores data in an easy to share file format. It is used inside countless mobile phones, computers, and various other applications used by people every day. And the most amazing part, it comes bundled with Python! Heck, there is a reason why giants like Facebook, Google, Dropbox, and others use SQLite!
In this article, we will explore how to work with databases in Python using SQLite and look into the most commonly used SQL commands. So let’s start by asking the very basic question — what on earth is a database?
Table of Contents
- What is a Database?
- What is SQL?
- Why Should you use SQLite?
- Connecting to an SQLite Database
- Creating tables using SQL
- Inserting values in a table using SQL
- Fetching records from a table using SQL
- Loading a Pandas DataFrame into SQLite Database
- Reading an SQLite Database into a Pandas DataFrame
- Querying SQLite Database
- Where clause
- Group By clause
- Order By clause
- Having clause
- Join clause
- Update statement
- Delete statement
- Drop-Table statement
What is a Database?
It is structured so as to provide easy access and manipulation of the stored data. Organizations use it to store any information that may be necessary for the decision-making process. The major advantages of databases over normal file storage systems are that it reduces data redundancy to a large extent, facilitates sharing of data among various users, and ensures the security of data which may be of immense importance to an organization.
While there are various types of databases and their choice of usage varies from organization to organization, the most basic and widely used is the Relational Database model. It organizes the data into tables where each row holds a record and is called a tuple. And each column represents an attribute for which each record usually holds a value.
A Relational database breaks down different aspects of a problem into different tables so that storing them and manipulating them becomes an easy task. For example, an e-commerce website maintaining a separate table for products and customers will find it more useful for doing analytics than saving all of the information in the same table.
Database Management System (DBMS) is a software that facilitates users and different applications to store, retrieve, and manipulate data in a database. Relational Database Management System or RDBMS is a DBMS for relational databases. There are many RDBMS like MYSQL, Postgres, SQL Server, etc. which use SQL for accessing the database.
What is SQL?
But wait — we’ve been hearing the word ‘SQL’ since the beginner. What in the world is SQL?
SQL stands for Structured Query Language. It is a querying language designed for accessing and manipulating information from RDBMS.
SQL lets us write queries or sets of instructions to either create a new table, manipulate data or a query on the stored data. Being a data scientist, it becomes imperative for you to know the basics of SQL to work your way around databases because you can only perform analysis if you can retrieve data from your organization’s database!
Why Should you use SQLite?
SQLite is a relational database management system based on SQL. It is designed for embedded devices that require fast and reliable data. It is serverless, lightweight, and requires zero-configuration. Also, it reads and writes directly to a disk file that can be easily copied and is platform-independent.
SQLite stores data in variable-length records which requires less memory and makes it run faster. It is designed for improved performance, reduced cost, and optimized for concurrency.
The sqlite3 module facilitates the use of SQLite databases with Python. In this article, I will show you how to work with an SQLite database in Python. You don’t need to download SQLite as it is shipped by default along with Python version 2.5 onwards!
Connecting to an SQLite Database
The first step to working with your database is to create a connection with it. We can do this by using the connect() method that returns a Connection object. It accepts a path to the existing database. If no database exists, it will create a new database on the given path.
The next step is to generate a Cursor object using the cursor() method which allows you to execute queries against a database:
import sqlite3
# create new database
conn = sqlite3.connect('.\sql_db\Demo_table.db')
# create Cursor to execute queries
cur = conn.cursor()
print('Databse created.')
You are now ready to execute queries against the database and manipulate the data. But after we have done that, it is very important to do two things:
- Commit/save the operations that we performed on the database using the commit() method. If we don’t commit our queries, then any changes we made to the database will not be saved automatically
- Close the connection to the database to prevent the SQLite database from getting locked. When an SQLite database is locked, it will not be accessible by other users and will give an error
#Save changes
conn.commit()
print('Change saved')
#close database connection
conn.close()
print('connection closed')
Creating tables using SQL
Now that we have created a database, it is time to create a table to store values.
Let’s create a table that stores values for a customer of an e-commerce website. It stores values like customer name, the id of the product bought, name, gender, age, and the city the customer is from.
A table in SQL is created using the CREATE TABLE command. Here I am going to create a table called Customer with the following attributes:
- User_ID — Id to identify individual customers. This is an Integer data type, Primary key and is defined as Not Null
The Primary key is an attribute or set of attributes that can determine individual records in a table.
Defining an attribute Not Null will make sure there is a value given to the attribute (otherwise it will give an error).
- Product_ID — Id to identify the product that the customer bought. Also defined as Not Null
- Name — Name of a customer of Text type
- Gender — Gender of a customer of Integer type
- Age — Age of the customer of Integer type
SQL keywords are case-insensitive so you can write the commands in UPPERCASE IF YOU WANT!
Any SQL command can be executed using the execute() method of the Cursor object. You just need to write your query inside quotes and you may choose to include a; which is a requirement in some databases but not in SQLite. But it is always good practice so I will include it with my commands.
So, using the execute() method, we can create our table as shown here:
# connect to existing database
conn = sqlite3.connect('Demo_table.db')
cur = conn.cursor()#create table in database
curr.excute('''CREATE TABLE CUSTOMER(User_ID INTEGER PRIMARY KEY NOT NULL,product_ID INTEGER NOT NULL,Name TEXT NOT NULL,Gender TEXT NOT NULL,AGE INTEGER NOT NULL,CITY TEXT);''')#commit and save changes to database
conn.commit()
Inserting values in a SQL table
- execute()
- executescript()
- executemany()
A database table is of no use without values. So, we can use the INSERT INTO SQL command to add values to the table. The syntax for the command is as follows:
INSERT INTO table_name (column1, column2, column3, …)
VALUES (value1, value2, value3, …);
But if we are adding values for all the columns in the table, we can just simplify things and get rid of the column names in the SQL statement:
INSERT INTO table_name
VALUES (value1, value2, value3, …);
Like I said before, we can execute SQL statements using the execute() method. So let’s do that!
cur.execute('''Insert Into Customer ('User_ID','Product_ID','Name','Gender','AGE','CITY') Values (1006, 3, 'Princess Diana', 'Female', 28, 'Amazons');''')
What if we want to write multiple inserts commands in a single go? We could use the executescript() method instead:
#Excute multiple commands at once
cur.excutescript('''Insert Into CUSTOMER Values(1005,3,'Clark Kent','Male',36,'Metropolis');
Insert Into CUSTOMER Values
(1003,4,'Bruce Wayne','Male',39,'Gotham City');''')
What if we want to write multiple Insert commands in a single go? We could use the executesript() method instead:
# Execute multiple commands at once
cur.executescript('''Insert Into CUSTOMER Values (1005, 3, 'Clark Kent', 'Male', 36, 'Metropolis'); Insert Into CUSTOMER Values (1003, 4, 'Bruce Wayne', 'Male', 39, 'Gotham City');''')
Or just simply use the executemany() method without having to repeatedly write the insert into command every time! executemany() actually executes an SQL command using an integer to yield the values:
# Insert maultiple values into table at once
customers = [(1004, 2, 'John Wick', 'Male', 32, 'New York'),(1001, 1, 'Tony Stark', 'Male', 35, 'New York'),(1002, 3, 'Gordon Ramsey', 'Male', 38, 'London')]cur.executemany('Insert Into CUSTOMER Values (?,?,?,?,?,?)', customers)
These methods are not limited to the Insert Into command and can be used to execute any SQL statement.
Now that we have a few values in our table, let’s try to fetch those values from the database.
Fetching Records from a SQL table
- fetchone()
- fetchall()
For fetching values from the database, we use the SELECT command and the attribute values we want to retrieve:
SELECT column1, column2, … FROM table_name;
If you instead wanted to fetch values for all the attributes in the table, use the * character instead of the column names:
SELECT * FROM table_name;
To fetch only a single record from the database, we can use the fetchone() method:
# Fetch all rows of query result
cur.execute('SELECT * FROM CUSTOMER;').fetchone()
To fetch multiple rows, you can execute a SELECT statement and iterate over it directly using only a single call on the Cursor object
# iterate over the rows for
row in cur.execute('SELECT Name FROM CUSTOMER;'):
print(row)
But a better way of retrieving multiple records would be to use the fetchall() method which returns all the records in a list format:
# Fetch all rows of query result which returns a listcur.execute('SELECT * FROM CUSTOMER;').fetchall()
Awesome! We now know how to insert values into a table and fetch those values. But since data scientists love working with Pandas data frame, wouldn’t it be great to somehow load the values from the database directly into a data frame?
Yes, there is and I am going to show you how to do that. But first, I am going to show you how to store your Pandas data frame into a database, which is obviously a better way to store your data!