Last year, my Scented Candle ratings trends study examined a graph of scented candles reviews by Kate Petrova and found a few issues. This time, I look at Nick Beauchamp‘s share of a simple plot of candle review counts that contained “no smell” or “no scent” in the text.
However, the chart is looking at raw counts instead of rates, and recent Covid variants are not strongly associated with loss of smell. Beauchamp effectively retracted the tweet a day later, saying “I wouldn’t take this too seriously. Even looking at percentages, there seems to be a seasonal surge in ‘no smell’ each winter.” Nonetheless, the original image still garners retweets, and Beauchamp has turned it into a conference paper, “This Candle Has No Smell”: Detecting the Effect of COVID Anosmia on Amazon Reviews Using Bayesian Vector Autoregression.
I may not be up for trying to reproduce the Bayesian vector autoregression model, but it does motivate me to take a look at the raw data myself. The original review data wasn’t shared, so I had to collect it anew. I saw a few guides on how to scrape Amazon review data, but there were a few potential gotchas, including the possibility of being blocked by Amazon. So instead of doing it myself, I bought a three-day pass at ExportComments and gave it 30 popular candles to collect reviews for. The pass allows 5000 comments per product which was enough to stretch back to at least 2018 for each candle.
After a bit of data cleaning, I ended up with 52,000 reviews of 26 candles (7 of which were Yankee Candle products), some going back to 2005 (but few before 2017). The biggest cleaning step was realizing that some candles had twins, with which the reviews were shared/duplicated. For instance, Yankee Candle Balsam & Cedar reviews are identical to the Yankee Candle White Christmas reviews. Not sure why that is—presumably it’s some overgeneralization of the way versions of the same product with different packaging share the same reviews.
I uploaded the cleaned data to github/xangregg/data. It includes the extracted features for each of the 52,000 reviews with a small codebook in the readme.
My first step in any chart study is reproduce the original chart, and that’s especially valuable here where the data collection is different. Here are the no-scent/no-smell counts over the same period as the original chart.
The trend line shapes match very well, as do many of the extreme values in a relative way. My absolute counts are over twice as high, mostly because I’m using more reviews. The paper used about 10,000 reviews of four Yankee Candles, and I collected about 21,000 for this period.
Switching to rates instead of counts and expanding the time period and “no smell” variations, I shared this version on Twitter.
The late year peaks are still visible but not as pronounced, and they also occur to some degree in prior years, which is the reason for the caution quoted above. How do no-smell reviews relate to Covid waves?
I don’t see a connection between the waves, but the no-smell rates are up overall since before 2020, which might reasonably be Covid-related since loss of smell was a common symptom for early variants.
What accounts for the late year surges in no-smell reviews, anyway? Gathering more information, do all candle scents have that trend? Here’s a breakdown on the Yankee Candle scents.
It appears that the fall surges are not universal and that Balsam & Cedar is the main source. Remember from my cleaning step above that Balsam & Cedar shares reviews with the White Christmas scent. That at least gives a linkage to the holiday season. The fall surges would make sense if White Christmas always has more no-smell complaints since it surely represents a greater proportion of the combined reviews in the fall.
Vanilla Cupcake also has a fall surges, but smaller. Its review happen to be combined with Apple Pumpkin, which I would guess to be another scent more popular in the fall.
I scanned the low-scoring reviews for other common complaints and noticed quite a few involving broken glass canisters. Here’s that trend over time.
Wow, a prominent bump in 2020. I can believe that to be a result of the stressed shipping system during the lockdowns of 2020.