« The Next Time I See Eric Gundersen | Main | Dave Weinberger and Linnaeus' paper »

June 16, 2005


Rick Thomas

Clay is good writer and offers many insights, but on these topics he has an unfortunate tendency toward a political style of argument. Typically he creates a straw-man - the Hierarchic Taxonomy - and conflates it with Ontology and the Semantic Web.

Imagine a Victorian armchair scientist who decides to catalog the world's food ingredients. Drawing from the leather-bound volumes in his library he and his assistant write out the first hundred cards, and they have a problem - retrieval. So he makes a command decision (being recently retired from Her Majesty's service) to sort the cards first by Animal-Vegetable-Mineral and to chart the subdivisions below. Maybe he persevered through his long retirement to build a curious if not very useful reference.

This is Clay's mythical Hierarchic Taxonomy. It was born in a time with different information processing constraints and survives in cases where it truly matches the problem at hand. The hierarchy of species helps to understand evolution. Maintenance of modular assemblies is guided by a hierarchic reference. But no one uses it for general information management.

A century later retired gentlemen undertakes the same task. He writes note cards at the public library and types them into a flat data file in his a second-generation personal computer - to make searching easy. He puts fields in his database including the name of the foodstuff, cuisines that use it, cost, and AVM - AVM referring to one of the labels Animal, Vegetable, or Mineral.

Now there's no hierarchic classification needed and our gentleman has invented Tags! In fact the technology of the day doesn't support hierarchy but tags are quite natural.

A few decades later a group of culinary arts students decides to create an open catalog of the world's recipes. One of the first challenges is to design an ontology of foodstuffs so that cooks around the planet can describe their diverse local ingredients.

Does this mean they need to make a "global" classification scheme? Of course not. An ontology is merely a flexible database scheme, first delineating some useful category - foodstuffs - and then stating an open list of things that one might say about a foodstuff - name, cuisine, AVM, etc.

Would tagging accomplish the same thing? Not as currently conceived. First, Clay seems to reject any classification of the problem domain. While practitioners naturally think of different sets of tags for, say, ingredients and recipes, apparently this two-level hierarchy is too much. Second, tags are not as expressive as a database field because the field name carries semantics - Dollars-per-pound: 15 vs. the tag "15DPP" ??

Going further our students make ontologies for ingredients, recipes, utensils, and skills using semantic web technologies. In time the ad hoc standards they establish catch on with influential grocers, publishers, chefs, manufacturers, and schools. This creates an open ecosystem in which we all become better cooks.

The key feature of the semantic web is to be able to combine, search, and assemble data from all sources. The semantic web is scalable; tagging is not because it is a muddle. Tags are certainly part of this picture. But if tagging is to evolve to do real work it will have to take on more structure and it will begin to look like a distributed open database - a lot like the semantic web.

Think which would you prefer to eat - a nicely structured dish, or a big tag mush?

Rick Thomas

Clay starts with the premise that all classification schemes and all tags are transient. No doubt in some absolute sense. But let's be real; no one is pursuing a "theoretically perfect view of the world". In most cases there's a plain and obvious view of the world that just lacks organization. Everybody knows the world changes constantly, but stable classifications are common, practical tools, not binary-mindedness. Ontology and the semantic web (and typical XML web applications for that matter) are useful distributed database applications, not the descendents of the antique absolute Taxonomy.

Clay seems to have some hope that because of the simplicity of tags and thus their profusion tagging will lead to some new kind of emergent applications, which incidentally will vanquish all classification. May be.

But more likely we'll want to extract classifications from them - blatant categorization. That's what comes to mind when you suggest using Clay's "tag signatures" to identify different kinds of information. Likewise when Clay observes that tags are often correlated. The structure teased out of tags will start to look a lot like ontologies: a commonly observed class and an open list of things that might be said about members of that class.

You would like to "look at the flow of meaning through an information space". But Clay insists dogmatically that "tag semantics are in the users. This is not a way to inject linguistic meaning into the machine." I suspect this is one reason that Clay is hot for tags: tagging naively short-circuits the basic methods of representing semantics in data. Politics, ugh.

Really, semantic web technology is your friend so don't throw it out before you see how it complements tagging. Ontology may be flexibly extended so it doesn't restrict what we can say; tags may be expressed unchanged. Ontology is also adopted socially, admittedly with a heavier cost and requiring more mature commitment compared to the teen-like flightiness of tagging. Ontology guides expression that is more kin to natural language than the truncated notion of "tag".

Jon Husband

I'm thinking that this is one of those domains/areas where *both/and* will evolve, with probably relative degress of *loose/tight* in terms of applicabilities.

And it will become the practice to "cook your own", in an environment of chacun a son gout ... which is what tags, generally, can add to the recipes ... no ?

The comments to this entry are closed.