Joins are everything youâll need when dealing with databases.
[Aggregate Functions in SQL for Data Analytics đ
Sum, average, count, min, max and all the quickies you need to pull out insights.medium.com](https://medium.com/@aakriti.sharma18/aggregate-functions-in-sql-for-data-analytics-8a36ca77af0a "medium.com/@aakriti.sharma18/aggregate-func..")
JOINS
The real power of SQL, comes from working with data from multiple tables at once. The tables are all part of the same schema in a relational database. The term relational database refers to the fact that the tables within it ârelateâ to one anotherâââthey contain common identifiers that allow information from multiple tables to be combined easily.
Letâs start with a basic JOIN.
A JOIN is used to map together two tables on the basis of a condition specified by the ON clause. After the FROM
statement, we have two new statements: JOIN
, which is followed by a table name, and ON
, which is followed by a couple column names separated by an equals sign.
SELECT col1,col2...
FROM table1 JOIN table2
ON table1.colname = table2.colname
In plain English, this means:
Join all rows from the table1 on to rows in the table2 for which the colname
field in the table1 is equal to the colname
field in the table2.
Now what does this actually do?
It takes every row in table1 and adds the columns of table2 to each row where the column value matches.
The above scenario however useful on paper, is not very practical. Why?
Because data in the real world is not always clean. Itâs often the case that one or both tables being joined contain rows that donât have matches in the other table. The way this is handled depends on whether youâre making an inner join or an outer join. Weâll be discussing different ways to do this to cater to our individual needs đ
INNER JOIN
Inner joins eliminate rows from both tables that do not satisfy the join condition set forth in the ON
statement. In mathematical terms, an inner join is the intersection of the two tables.
OUTER JOIN
When performing an inner join, rows from either table that are unmatched in the other table are not returned. In an outer join, unmatched rows in one or both tables can be returned. There are a few types of outer joins:
LEFT JOIN
returns only unmatched rows from the left table.RIGHT JOIN
returns only unmatched rows from the right table.FULL OUTER JOIN
returns unmatched rows from both tables.
Note : The left or right here is based upon the sequence in which the table names are written in the statement.
Example :
FROM left_table JOIN right_table
Itâs worth noting that LEFT JOIN
and RIGHT JOIN
can also be written as LEFT OUTER JOIN
and RIGHT OUTER JOIN
, respectively.
Filtering the Data with ON
As we know, generally to filter out data with conditions, we use the WHERE clause. If you are not familiar with it, check this out :
[Introduction to SQL for Data Analysis đ
The what why and how of SQL when you are just getting started.medium.com](https://medium.com/@aakriti.sharma18/introduction-to-sql-for-data-analysis-1c4177b36eba "medium.com/@aakriti.sharma18/introduction-t..")
But what if you want to join two tables based on a condition?
We can simply do it with -
SELECT *
FROM table1 JOIN table2
ON table1.col = table2.col
WHERE
We can also remove the WHERE clause and concatenate the condition statement with ON using the AND operator like this -
SELECT *
FROM table1 JOIN table2
ON table1.col = table2.col AND
UNION
JOIN
allows you to combine two datasets side-by-side, but UNION
allows you to stack one dataset on top of the other.
Put differently, UNION
allows you to write two separate SELECT
statements, and to have the results of one statement displayed in the same table as the results from the other statement.
Usage:
SELECT *
FROM table1
UNION
SELECT *
FROM table2
But UNION can only be applied when these conditions are satisfied :
- The number of columns in both the table are same.
- The datatypes ( integer / date / string ) are same in the same order.
An important point to note is that UNION only appends distinct value, meaning if there are any repetitions in the tables, theyâll only be printed once. However, if one wishes to see all the repeated values, UNION ALL can be used instead of UNION.
JOIN on Multiple Keys
Till now we are only joining the tables on the basis of the values of one column. However, this can be done for different columns as well.
This is a good practice as it adds to the accuracy and speed of the query.
SELECT *
FROM table1 JOIN table2
ON table1.col1 = table2.col1 AND table1.col2 = table2.col2
Self JOIN
Sometimes it is useful to join the table to itself. For instance to extract specific details of concurring activities present in different rows. A table can be joined to itself as -
SELECT *
FROM table1 t1 JOIN table1 t2
ON t1.col = t2.col AND .....
Thatâs it for this part!
Congratulations on learning the essentials of SQL, in the next part of the series we would be discussing about applying these concepts to real world scenario like data wrangling and cleaning and also talk about window functions.
Link to the next part :
[Data Cleaning with SQL đ§ź
The foundational preprocessing step explained the easy way!medium.com](https://medium.com/@aakriti.sharma18/data-cleaning-with-sql-eaab6d29d007 "medium.com/@aakriti.sharma18/data-cleaning-..")
Interested in daily data analysis content? Follow me on Twitter and Medium.
Stay tuned! â¨