Stats of Excellence 2: The Restatening


I’m giving up predicting things. My predictions for the Brazil CoE, frankly, did not correlate well with reality. So this addendum will focus on simple observations, while clairvoyance will take a backseat.

One part I felt I gave short-shrift were the descriptors. Because of the tedium of having to go into each individual farm page to retrieve the cupping descriptors, my sample size was relatively low. Now having acquired a nice little script to compile all the cupping descriptors, my sample size has increased to


coffees – the number of coffees that have gone to auction since 2003 (no cupping information available on the site prior to that year). Having the complete post-2003 set, gives us a chance to look at some trends since then, and also some interesting observations like


The most words used to describe a single coffee. This honour goes to the number 2 coffee in the 2008 Colombia auction – La Gloria. Oddly the 4 highest all come from the 2008 Colombia auction, where even the last placed auction coffee had a massive 60 descriptors.

  1. La Gloria (Colombia 2008 #2) – 109
  2. Villa Loyola (Colombia 2008 #1) – 104
  3. El Libano (Colombia 2008 #3) – 93
  4. El Encanto (Colombia 2008 #4) – 89
  5. Burmera Mig (Rwanda 2008 #1) – 85

In fact the amount of descriptors used has been increasing since 2003, apparently peaking in 2008. However, 2009 saw a dip across the board.


I’m not sure if there was a concerted effort in 2009 to simplify matters, though sometimes descriptions can be too simple like the


coffees that received the joint lowest number of descriptors – 1 each. Both of these coffees appeared in the 2003 Brazil CoE, where the jury appeared to be stuck for words, or perhaps someone got tired of typing them up (most likely)? Coffees #38 and #42, Fazenda São Marcos and Sítio Primavera, respectively are described merely as floral and spicy. Further words may have created a clearer picture, unlike


(Serra do Rola Moça – Brazil 05) which is one of many words that were used only once that leave me none the wiser. I’m also not entirely sure how a coffee can be naughty (Santa Elena II – El Salvador 03) or unshakable (Los Delirios – Nicaragua 04). While I find descriptors like plumeria (Guayacanera – Colombia 05), nasturtium (Finca Carrizal  – Costa Rica 07) and persimmon (El Portillo Oscuro – Honduras 05) to be specific to the point of obscurity. I can only assume that sweet ditch chocolate aftertaste (Finca Santa Lucia – Nicaragua 04) was a typo, while big phat cup (Burmera Mig – Rwanda 08) was surely a product of the CoE’s misguided keepin’ it real campaign.

At the other end

of the spectrum in a group of 1,281 coffees, the word sweet was used 1,195 times. Its near ubiquity makes it in essence redundant. Although 88 times it was accompanied by the adverb very. Does this mean the other 1,107 times it was used, were just referring to averagely sweet coffees? Chocolate was the second most popular word, used 821 times. Here’s a pretty word cloud generated from all descriptors collected, showing the most popular words.


Although it is somewhat interesting to bundle all the descriptors into one bucket like this, perhaps more interesting is when we separate them out into groups and look at them. If we separate the descriptors by country it is possible we may get some indication of elements of


specific to, or at least more common in, individual countries. I am reminded of Tristan Stephenson’s effort at proposing a coffee flavour map, which, while a nice idea, was too easy to pick enormous holes in. Here, perhaps, we can make some humble observations in that spirit.

As we already said, sweet is the most commonly used descriptor, in fact in Guatemala and Nicaragua its average use is more than once per coffee (1.12 and 1.10 respectively). However, in two countries, Bolivia and Rwanda, it is not the top descriptor. In Bolivia it appears on average only 65 times out of 100 coffees. Conversely, and perhaps coincidentally, in Bolivia and Rwanda the descriptor orange appears 65 and 91 times out of 100 respectively, while in all the other countries it ranges from between 15 and 32 times out of 100.

The descriptor clean is another interesting one to focus on, it appears most frequently in Guatemala – 40 times per 100 coffees, in the other American countries it ranges from 20 to 32 times per 100 coffees, in Rwanda – 4 times! That is to say, that in the entire, solitary Rwandan CoE, which auctioned 23 coffees, the word clean was only used once.

Perhaps an indictment of that aspect of Rwanda’s CoE coffees, but several positive attributes were more frequently associated with the Rwandan coffees. Maple occurred 11 times more frequently in the Rwandan sample than in an average of the other countries, toffee 10 times, heavy 6 times, and tea 5.5 times more frequently. Floral, was the most common descriptor from the Rwandan CoE, it appeared 25 times, despite there being only 23 coffees (it can appear as both an aroma and a flavour note in fairness).

Perhaps the timely addition of a summarizing


will convey these figures in a more readily consumed format.

Screen shot 2010-01-28 at 22.42.12

Make of that what you will. Note Rwanda was omitted from the calculations of the other countries, as it being a strong outlier would skew the data. So when, for example, we see that grass is ten times more common used to describe a Guatemalan coffee, it is ten times more likely than the average of all the other American coffees.

To complete this exercise, and believe me, at this point this is really starting to feel like an exercise, I will compare descriptor frequencies by year. This throws up another multitude of at first

seemingly interesting

though on reflection probably not really, statistics.

Oak was used 3 times in the 2003 El Salvador CoE, specifically toasted oak. Oak did not reappear in a single auction until Brazil 09, where it appeared once.

Layered or multi-layered appeared 4 times in 2004 (3 of those in Honduras), yet not again a single time until Bolivia 09.

In noticing that Belgium appeared twice in 2004, in the El Salvador CoE, as Sweet Belgium Chocolate, I found that (apparently) in error the number 3 and 5 coffees have exactly the same descriptors. Data entry fail.

Positive was used 4 times in 2005, once each in Bolivia, Brazil, Honduras and Nicaragua, and once again in Guatemala 06, at no other time has it been used.

Sophisticated also appeared 3 times in 2005, and at no other time.

Sauvignon has been used only 3 times, ever, 2 of which were in Colombia 06, the other Honduras 06.

3 coffees merited Meyer lemon in Bolivia 07, the only other time it was used being Bolivia 09.

The number 1 and 2 coffees in El Salvador 07 were described as celestial, no other coffee has been described as that.

Fruit basket and extra (as in extra fruity, extra long finish) appeared 3 times each, only in 08. Extra was limited to Brazil 08.

Malic is a descriptor that seems to have really captured people’s imagination recently. It first appeared in 2008, popping up 6 times. It was used 25 times in 2009! Suddenly, everything is malic.


what does all this show? Do I have to draw a conclusion? Conclusion are dangerous. Descriptors are a funny business, it’s really about perception, Certain judges probably like certain words, or pick up certain things more easily. One man’s persimmon is another man’s plumeria, one man’s Meyer Lemon is another man’s normal lemon. Maybe it does take 109 words to sufficiently describe a coffee, maybe Rwandan coffee is relatively dirty.

Maybe everything is becoming increasingly malic.


*as with the last post - the word cloud was generated with Wordle. Also like to thank Fergus Moloney for his help with scripting to get all this data somewhat manageable.


7 thoughts on “Stats of Excellence 2: The Restatening

  1. Pingback: Descriptors « Dear Coffee, I Love You. | Caffeinated Inspiration.

  2. It seems to me that the judges different cultural background will affect how they associate different percieved flavours in coffee with other things.

    A Norwegian judge would most certainly have other flavour descriptors than an American or a Japanese judge on the same coffee.

    Also a persons other background will affect this. If one person has worked with wine or have a hangup on microbrewery beer their descriptors would be affected by this.

    The human factor plays a big role in this.

    Love the follow-up! It DOES look like an exercise indeed.

  3. This could be turned into an amazing infographic on coffee descriptors. I especially love the Wordle… those are so effective at really showing what is most important in text visually.

  4. Pingback: Data Visualization: Word Clouds « Mobius Art and Science Initiative

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s