Everyone engages in Ego searching i.e. searching for your own name on Google and so do I. While on such a trip I came across all my search history that google had preserved for me since late 2013.What I present you here is a little bit of exploratory data analysis of my google search history.
What I realized was simply combing through the timestamps of the searches and breaking them down to days/hour/weekday level said a lot about of my internet/online habits. Adding the search text into the mix pretty much illustrated the key events of my life over the last two years.
Getting Your Google Search History
If you have allowed Google to save your search history and are always logged into google account when using using google even for search (which you should since Google is then able to personalize the results) then you can download your search history from the Google History page. Once you are on the page , navigate to the the three dots the top right corner of the page. From there you can download your search history by simply clicking the Download Searches link. Google will email you your search results as zip file containing the json files.
Once you have the zip file (and if you want to follow along) unzip the file and place the json files along with this notebook.
For some reason I could not get Pandas to parse the json files straightaway into dataframe, instead had to write a small script to convert the json files into a csv format. Here is the gist if you want to try it out.
Though I have used Python/Pandas & Plotly fro this kind of analysis before, I use Tableau this time around. Here is the simple graph that illustrates the number of searches per day over the last two years. The graph shows an overall increasing trend which is not surprising but not terribly useful either. I am more interested in understanding my online behavior i.e. what time of the day am I active the most, which days and months have I searched more.
Activity By Month
Clearly August and September (followed by December) perhaps are my most active while May and March are the least active ones.
Activity By Day
This visualization was bit of a surprise as I always thought I was much more active online on the weekends than I was during the day…turns out its not the case. One big surprise of course is Friday. I knew I slowdown by Friday but did not expect it to be by that much.
By the Hour of the day
Looks straightforward and corresponds to my intuition though I am a little surprised that my weekend pattern is so similar to my weekday patterns. I was expecting it to be lot more divergent i.e. much more subdued online activity throughout the day with peaks only the evenings or night…turns out not to be the case.
Hourly – Over the Years
My data for 2013 (when I started collecting my search results) and 2016 is incomplete. Once we disregard the relatively sparse data from those 2 years, clearly my search activity has increased over the years but what is surprising is how consistent my activity patterns have been over the course of the day. My activity around midnight is still close to my average during the day which is a bit worrisome when I should expect it to be much lesser. Time to get real about getting to bed a little early and getting more sleep.
What was I searching for ?
Since google search history also contains the actual search text, with a little bit of text mining (sklearn and nltk) its easy to generate a snapshot of what you were looking for at that point in life. I chose to do a simple wordcloud using the python wordclound package. Generating wordclouds using that package is fairly straightforward.
Here are a few snapshots from my life curated from my google search words.
Jan 2015 (frantic job search time)
Feb 2015 (move to the US)