Disaggregating Google Trends data

In a story about US emigration, the Washington Post’s Department of Data showed this chart of search frequency for “move to Canada”:

Area chart showing Google Trends search frequency for "move to Canada" with spikes at key events.

In addition to the annotations within the chart, the text added:

After the 2004 reelection of George W. Bush, the 2020 election of Joe Biden and the 2016 election of Donald Trump, Google search interest in moving to Canada spiked

It sounds like a mischaracterization to me to imply the election of Joe Biden was a trigger for a “move to Canada” spike. Even in their chart, the 2020 “spike” is the smallest annotated peak and is preceded by a longer ramp-up than the others, making it more of a sudden drop than a spike.

To get a better understanding, I set out to download the Google Trends data myself. The article nicely includes a link to their query, which indicates they looked at both “move to Canada” and “moving to Canada” searches. Or did they … ? Here’s the 2016 and later part of the original Google Trends chart for both queries, with “move to Canada” in red and “moving to Canada” in blue. The Post appears to have only used “move” and ignored the data for “moving”. Both are similar for 2016 but “moving” is much smaller for 2020. By only using “move” their 2020 spike is exaggerated.

Google Trends line chart with red for "move to Canada" and blue for "moving to Canada" over the past 5 years/

I can’t think of a good explanation for doing that, but I proceeded to retrieve both queries and combine them for my own charts. First, there are a few quirks regarding Google Trends data (as far as I can tell).

  1. The granularity of the data is solely a function of the date range of the query. The granularity might be monthly, weekly or daily (or finer for current data).
  2. There’s no (longer any) API for downloading the data – you have to click a download button to get the data.
  3. The data is normalized so that the maximum value is always 100, meaning the values are relative over the search interval.

Wanting daily data over the entire range, I eventually figured out if I asked for 8 months of data at a time, I would get daily data. That handles quirk #1. For quirk #2, I was able to write a script that opened 29 web pages (at least the data range is in the URL), each covering 8 months and then I just had to click all the download buttons manually.

Quirk #3 meant all of those 29 files had values in different scales. My first strategy to overcome that was to incorporate some overlap in my queries so I could use those for alignment (like aligning tree rings to date old lumber). However, with the activity on many days being 0 or very low, the alignment wasn’t precise. Instead I used the monthly values from the original full range query to normalize the daily values within each month and then re-normalized everything back to 0 – 100.

While daily data may be too granular to view over the whole 20-year span, it did provide one unexpected insight, which can be seen in this 5-year view.

Line chart of Google Trends daily search frequency for moving to Canada. Mostly near zero with a big spike for 2016 election and a much smaller spike during the 2020 election.

The single day spike for the 2016 election is more than 10 times bigger than for the 2020 election. The Post chart suggests 2016 is only 4 times bigger, which is partly because of they ignored one of the search queries and partly because the other days of the months weren’t that different and worked to even things out.

Zooming in on the 2020 election, we see the spike occurred on November 4, a day after the election but days before the election was called for Biden by major media outlets on November 7. So it wasn’t “after … the election of Joe Biden,” as the article stated.

Bar chart of Google Trends daily search frequency for moving to Canada during Fall of 2020. Mostly near zero a spike on Nov 4 (relative values of 8 out of 100) and slightly elevated values before and after that. Smaller spikes at Sep 19 and Sep 30.

I also labeled the two other spikes in this period. September 19 was a day after Ruth Bader Ginsburg died. September 30 was the date of the first presidential debate.

Bonus chart: here’s daily data for November 2016 and 2020 overlaid for a more direct comparison of the two elections. There is still the issue of Election Day falling on different days.

Area chart of Google Trends data for November 2016 and 2020 showing the respective spikes regarding moving to Canada.

Just to be clear, all of my charts are showing the combined “moving to Canada” and “move to Canada” trends data.

Leave a Reply