Joins in SQL 🤝

Joins in SQL 🤝

¡

5 min read

Joins are everything you’ll need when dealing with databases.

[Aggregate Functions in SQL for Data Analytics 📉
Sum, average, count, min, max and all the quickies you need to pull out insights.medium.com](https://medium.com/@aakriti.sharma18/aggregate-functions-in-sql-for-data-analytics-8a36ca77af0a "medium.com/@aakriti.sharma18/aggregate-func..")

JOINS

The real power of SQL, comes from working with data from multiple tables at once. The tables are all part of the same schema in a relational database. The term relational database refers to the fact that the tables within it “relate” to one another — they contain common identifiers that allow information from multiple tables to be combined easily.

Let’s start with a basic JOIN.

A JOIN is used to map together two tables on the basis of a condition specified by the ON clause. After the FROM statement, we have two new statements: JOIN, which is followed by a table name, and ON, which is followed by a couple column names separated by an equals sign.

SELECT col1,col2...
FROM table1 JOIN table2
ON table1.colname = table2.colname

In plain English, this means:

Join all rows from the table1 on to rows in the table2 for which the colname field in the table1 is equal to the colname field in the table2.

Now what does this actually do?

It takes every row in table1 and adds the columns of table2 to each row where the column value matches.

The above scenario however useful on paper, is not very practical. Why?

Because data in the real world is not always clean. It’s often the case that one or both tables being joined contain rows that don’t have matches in the other table. The way this is handled depends on whether you’re making an inner join or an outer join. We’ll be discussing different ways to do this to cater to our individual needs 👇

INNER JOIN

Inner joins eliminate rows from both tables that do not satisfy the join condition set forth in the ON statement. In mathematical terms, an inner join is the intersection of the two tables.

OUTER JOIN

When performing an inner join, rows from either table that are unmatched in the other table are not returned. In an outer join, unmatched rows in one or both tables can be returned. There are a few types of outer joins:

  • LEFT JOIN returns only unmatched rows from the left table.
  • RIGHT JOIN returns only unmatched rows from the right table.
  • FULL OUTER JOIN returns unmatched rows from both tables.

Note : The left or right here is based upon the sequence in which the table names are written in the statement.

Example :

FROM left_table JOIN right_table

It’s worth noting that LEFT JOIN and RIGHT JOIN can also be written as LEFT OUTER JOIN and RIGHT OUTER JOIN, respectively.

Filtering the Data with ON

As we know, generally to filter out data with conditions, we use the WHERE clause. If you are not familiar with it, check this out :

[Introduction to SQL for Data Analysis 📈
The what why and how of SQL when you are just getting started.medium.com](https://medium.com/@aakriti.sharma18/introduction-to-sql-for-data-analysis-1c4177b36eba "medium.com/@aakriti.sharma18/introduction-t..")

But what if you want to join two tables based on a condition?

We can simply do it with -

SELECT *
FROM table1 JOIN table2
ON table1.col = table2.col
WHERE

We can also remove the WHERE clause and concatenate the condition statement with ON using the AND operator like this -

SELECT *
FROM table1 JOIN table2
ON table1.col = table2.col AND

UNION

JOIN allows you to combine two datasets side-by-side, but UNION allows you to stack one dataset on top of the other.

Put differently, UNION allows you to write two separate SELECT statements, and to have the results of one statement displayed in the same table as the results from the other statement.

Usage:

SELECT *
FROM table1
UNION
SELECT *
FROM table2

But UNION can only be applied when these conditions are satisfied :

  • The number of columns in both the table are same.
  • The datatypes ( integer / date / string ) are same in the same order.

An important point to note is that UNION only appends distinct value, meaning if there are any repetitions in the tables, they’ll only be printed once. However, if one wishes to see all the repeated values, UNION ALL can be used instead of UNION.

JOIN on Multiple Keys

Till now we are only joining the tables on the basis of the values of one column. However, this can be done for different columns as well.

This is a good practice as it adds to the accuracy and speed of the query.

SELECT *
FROM table1 JOIN table2
ON table1.col1 = table2.col1 AND table1.col2 = table2.col2

Self JOIN

Sometimes it is useful to join the table to itself. For instance to extract specific details of concurring activities present in different rows. A table can be joined to itself as -

SELECT *
FROM table1 t1 JOIN table1 t2
ON t1.col = t2.col AND .....

That’s it for this part!

Congratulations on learning the essentials of SQL, in the next part of the series we would be discussing about applying these concepts to real world scenario like data wrangling and cleaning and also talk about window functions.

Link to the next part :

[Data Cleaning with SQL 🧼
The foundational preprocessing step explained the easy way!medium.com](https://medium.com/@aakriti.sharma18/data-cleaning-with-sql-eaab6d29d007 "medium.com/@aakriti.sharma18/data-cleaning-..")

Interested in daily data analysis content? Follow me on Twitter and Medium.

Stay tuned! ✨

Â