End-to-End Data Analysis Project: Analyze Twitter’s Reaction to Taylor Swift with HarperDB (Part 2)
Data visualization is the process of representing complex data and relationships through the form of visualization like charts, bar graphs, plots and information. It is a more efficient way of communicating findings in data since it makes it easier to identify patterns, trends and outliers for huge amounts of data. In this tutorial we would focus on creating beautiful insight driven visualization from a huge database of 120k+ data points hosted on HarperDB cloud.
In the pervious iteration of this series, we performed the following -
[Analyze Twitter’s Reaction to Taylor Swift with HarperDB
An end-to-end Data Analysis Portfolio Project.medium.com](https://medium.com/@aakriti.sharma18/analyze-twitters-reaction-to-taylor-swift-with-harperdb-6207a50aee5d "medium.com/@aakriti.sharma18/analyze-twitte..")
- Scraped data from Twitter through Twint
- Performed feature engineering on the dataset
- Utilized python for processing text data like stop word removal, lemmatization and noise reduction
- Created a HarperDB instance
- Set up the cloud database and defined the schema
- Loaded data to the cloud database
Next, we will focus on connecting our cloud database to a visualization software by installing drivers, configuring the data source manager, locating the schema, and and then performing analysis through infographics.
Installing Drivers
- Download and install the ODBC Driver from here .
- Run and install the .exe file downloaded.
- Click on Finish after the installation is complete.
Configuring DSN
- After the installation process completes, a Data Source manager is automatically launched ,if it doesn’t you can do so manually from the search panel.
Upon selection of CData HarperDB Source, you’ll be presented with the following screen:
- Fill in the fields Server(link to your instance ), User, Password (credentials to log in to the instance) and click on Test Connection. You should get this -
Your connection to your HarperDB instance is successfully established. Next we’ll connect to it through our visualization software and fetch data for creating the dashboard.
Connection to Tableau Desktop
- Launch Tableau Desktop. Go to More under the ‘To the server’section and search for HarperDB by CData .
- Click on HarperDB by CData to establish connection with the cloud hosted HarperDB instance.
- Frame the Connection in the following format :
Server=”link to instance”;UseSSL=”True”;User=”username to log in to instance”;Password=”instance password”;
- Voila! You are connected to your hosted database! Your screen should have something similar like this:
- From the Select Schema dropdown, select your schema which you defined before adding data to your instance. In my case it was named data .
- A list of tables under the Table section would be visible, drag the ones you desire to visualize. I’ll start with our scraped data stored in the table named tweets . You can choose whether you want the data to be static and stored in your local device by clicking on the radio button Extract at the top right corner. I chose to keep the connection Live so all the updated data is reflected dynamically on the dashboard.
Creating the Dashboard
Dashboard creation is a multi-process step starting with exploring data, deciding on a color palette, creating the visuals, formatting the labels and tooltips to achieve consistently, and then creating a basic layout or template over which all the charts and KPI elements are placed. Making the template match the template and resonate with the theme of your data helps in binding it all together while maintaining the aesthetic composition.
Exploring the Data
Upon exploration we note the following features which would be monumental in the creation of the dashboard to represent a well rounded story of the data-
Hashtags — hashtags used in the tweet
ID — primary key of the dataset
Language — language of the tweet
Name — name of the account tweeting
Photos — URLs of the photos attached in the tweet
Retweet — boolean flag representing whether the tweet was retweeted or not
Time — time of the tweet
Date — date of the tweet
Tweet — text of the tweet
Clean tweet — tweet after the removal of URLs, mentions, hashtags, punctuations and stopwords
Tokenized Tweet — tokenized form of the clean tweet
Username — username of the tweet author
Likes Count — number of likes on the tweet
Replies Count — number of replies on the tweet
Retweet Count — number of retweets of the tweet
Video — video attached to the tweet
Deciding the palette
Since the data revolves around Taylor Swift and the reaction of her fans, it is only right to create a Taylor Swift themed dashboard. For this tutorial I am moving forward with the palette of her 2019 album Lover.
Creating the Visuals
- Non English Speakers Distribution
Creating a chart exploring the various geographies that talk about the American singer other than the country she is based on helps understand the diversity of the fanbase.
For creating such a chart, drag language to the row shelf and tweets ( data ) Count to the column shelf. You’d see a bar chart. Drag tweets ( data ) Count to the marks shelf and change it’s role from detail to color. Edit the colors to purple.
Your chart should look something like this -
For non english we will exclude the row names en by right clicking on it and selecting Exclude. Select Treemap from the Show Me section.
We have the first component of the dashboard ready! You can make it more informative by adding labels.. I added Count as a percent of total -
- Top 10 Hashtags
For this we start with creating a set of top 10 hashtags by count. Go to the field > Create > Set.
For the set specifications, first exclude the empty tag list by unchecking it from the list of values :
Next go to Top and add the following parameters -
Press on OK and drag this to the row shelf, count to the column shelf. We’ll opt for a simple bar chart for this visualization.
For making the chart more visually appealing we would change the view from Standard to Entire View. Next, edit the colors to Purple and sort the values by descending. For adding more information we can add the count of each hashtag as label in the Mark card. Your chart will look something like this -
- Tweet distribution by Hour
For this viz, we drag the Time column to the column shelf, count to the row shelf and select discrete lines as the visualization type .
To further beautify this we increase the width of the line, change the color palette to our desired Purple and edit the axis to create a chart like this -
Third chart down! Let’s move on to the KPIs next.
- Total number of Tweets
For this KPI, we drag the Count to the mark shelf over the Text card. We can simply edit the heading and this label from the marks card to create a beautiful KPI consistent with the aesthetic of the dashboard.
- Average Like Count
Next, we drag the Likes Count to the mark shelf over the Text card. Now click on the down arrow button next to it and change the Measure from the default Sum to Average.
Next format the field to show just a rounded off integer and format the other fields to create something similar to this:
- Total number of Unique Accounts Involved
Finally, we drag the Username to the mark shelf over the Text card. Click on the down arrow button next to it and change the Measure from the default Sum to Count(Distinct). Edit the formatting in a similar way to the previous two and we’ll have our third and final KPI.
Formatting
Go to Format > Workbook in the top navigation bar and edit your combination of default font/colors. Here’s what I used :
Additionally, I also prefer to keep the background of my worksheets as None so they blend with the dashboard template perfectly. You can do so by going to the shading section of each worksheet and choosing None.
Template
Although not mandatory, it is advisable to design a dashboard with a background template. Any tool like Figma/Canva can be used to whip up a quick background template which enriches the visual component of the dashboard. For this dashboard you can use the one I designed here or make one of your own.
Assembling the Dashboard
Create a new dashboard and edit it’s size matching to the size of the template in the following manner :
Next drag an Image object and choose your downloaded template. After it is properly adjusted, change the alignment to Floating from Tiled. Start dragging the charts and KPIs one by one and resize them at their place. Lastly, drag a Text object to add a heading to your dashboard. Go to the Phone layout section and delete it as template dashboards get distorted and lose their alignment when opened on the phone.
You can then Publish it on your Tableau Public profile by Server > Tableau Public > Save to Tableau Public.
Here’s how mine looks :
Congratulations! You have reached the end of this tutorial. In these series we created an end-to-end Data Analysis project that helped us learn various concepts like :
- Scraping Data
- Text Preprocessing
- Utilizing Cloud Databases
- Extracting Data
- Connecting hosted data to a local tool
- Creating a Dashboard from scratch
That’s it for this time. See you on the other side!