Hello all, I know it’s been a while, but I’m starting to write/use more social media again.
Some of my current research:
I’m in the midst of analyzing the XKCD color-naming database, the largest color naming survey ever completed.
The hard part about color is that in order to do anything meaningful, like finding the ‘average’ color or computing the standard deviation, one has to convert the way a computer understands color to a (mostly) perceptually even color space. [I’m currently working on implementing a converter that converts to a much more complicated color-space, CIECAM02, for a related project. I’ll throw that up here when I finish. ]
You can find the python code on github:
This code can
- convert rgb to ciela*b* space for any initial configuration of illuminant and color primaries
- Graph the World Color Survey stimulus set in color, aka the outer skin of the Munsell color solid in 3d LAB space
- Graph any set of color categories against the 3d munsell chips
- Compute the average color for each color term in a list
- Snap every response for a given color term to the munsell chips, and then return a modal distribution of which chips occur most often
- Count the number of words for every term that contains a prespecified word. This way, I can look to see if the number of words used to name a color that also contains ‘yellow’ (for example) is a function of brightness.
The big concern with this data is that it was collected in the most unscientific way possible. However, given the large number of people that responded (roughly 220,000), I’m assuming some of the variance will be washed out.
From the start, I only use responses that came from users who reported having an LCD screen.
The largest issue with this data is that people took it under different lighting conditions. However, it’s pretty clear that a vast majority of users took the survey under normal, good all filament light bulbs, the kind that the ‘A’ illuminant attempts to capture. Also of issue is how each computer screen generates each color, and I’m waving my hand at all of that and using a value for LCD color primaries as measured in an academic paper. (The source can be found in the code)
Also, from a pure linguistics standpoint, there are interesting patterns in the actual color names regardless of what the referents are.
So far, In addition to simply mapping various color categories in 3D, I’ve looked at two things.
First, I wanted to see how well the results from the World Color Survey lined up with the XKCD database. The World Color Survey and subsequent analysis has revealed that around the world, people agree on what the best examples of blue, green, yellow and red are. It is assumed that some aspect of the best example, or focal colors, is universal, though this may turn out more to do with universal color boundaries.
Additionally, I was curious about word (contrastive focus) reduplication. Sometimes the same word is repeated twice, but has a different meaning each time. For example, “I ordered a salad salad, not the egg salad” (the number of linguistic examples that take place in restaurants or are about food is unsettling). Reduplication also occurs within the color-naming realm. “That’s a really blue blue”, for instance. With the salad example, this is known to signify the prototypical item, and I thought that this might also pick out the prototypical colors. In other words, “Blue Blue” from the XKCD database should line up with the best example data compiled by the world color survey.
Curiously, even after roughly 5 million responses, no one named a color “Yellow Yellow”. This follows a trend that yellow is the most infrequent color found in various linguistic corpra. This may be because yellow is already rather bright on its own and doesn’t need to be accented, but I don’t really know.
That’s all I have time for, but I’ll be posting actual data soon.
[For a less dry time, follow me on twitter!