The more I think about the Semantic Web, the more it sounds like very earnest barking up the wrong tree. I have always been leery of Metadata and especially that defined by the writer of the material in question. The whole point of Metadata is to in some way create a container for the content so that it can easily be identified by others. This is the supermarket model for the web; by putting all the tins of beans together we can make selection more efficient, drive up the total sales and we know this because the labels and the tins all tell us that these are beans.
I have every sympathy for the supermarket owners, they need this classification system and so do we, but that isn't the net and never will be, it is a commercialised view of the world that, above all, tries to brand information. It allows me to label things that will help MY information stand out from other people's. Now naturally the semantic web people want to mandate truth in advertising and all those play-nice things, but it will take about 10 minutes for someone to start gaming the system and it will go to hell in a hand basket. While it is possible to game Google, it is much harder and requires an overt conspiracy among many participants. At which point the gaming itself becomes a game, but not a tool to gain unfair advantage.
The whole point, as far as I can see, is that semantics try to make endemic to the document something that is inherently extrinsic (semantics is a description of a language, it is NOT an instruction kit) words are internal to the document but the meaning is contextual and mostly external.
On the other hand, a hologram is a much better model. You can focus till you go blind on the data in the hologram and you will be no wiser, to extract it meaning you have to shine a light through it and look away from it, what's more, it scales.
First, the containers stuff
The whole idea of containers for information is being superseded and so it should be. This is especially so when the person who creates the content gets to control the container. The value of the document is determined not by its author, but by the people who read it. It is their opinions and actions that matter, not those of its promoter. That is why Google works, it measures the economy of the Internet, the currency of which is linking, and those with the most currency (inbound links) float to the top.
Another problem with the semantic web is that it appears to use properties which attach themselves to documents, but the processes of information management are busy detaching the content from the container.
Even the latest version of Office uses applications such as Word or Excel essentially as filters through which to view the data. You pull the data from a separate file, pass it through the application which extracts information in certain ways and presents it according to its own capabilities. For example, if I look at a file through Word I might get a document with headings, if I look at it through Power Point I might get only the headings while the text becomes the speaking notes and the handout.
More importantly, we cannot know in advance what the meaning of a document actually is, and unless we can define it both so precisely as to make its role in the discourse very clear, but so generally that disciplines we have not yet invented can subsume it in the future, we are wasting our time. Investing THAT amount of work simply to enable machines to fit the document into a matrix is a wholesale waste of energy.
In any case, a great deal of what we know comes about by accident, such as Fleming's discovery of penicillin. If the metadata of those petri dishes had been "failures to be thrown out", that is what they would have remained. The ability to see new things in unexpected places, especially among the very familiar, is crucial to progress. Serendipity is the child of ambiguity and serendipity is the stuff of the net.
But language and communication is also highly structured and that is what has been bugging me about people who claim that search engines "only find x% of the document on the net", a problem that the Semantic web is supposed to solve. It is both true and irrelevant because no document exists in isolation, it is riven with pathways to and from other places in our knowledge map. Even if the search engines only find 10% of the documents, they reference (pick a number above) 90% of the information.
The language is meaningless unless it is completely open to the whole context of the language in which it is written, if you don't believe that, try reading this, language removed from context is mostly just noise, and I love James Joyce. Then a document has to be open to the whole history of the subject that it deals with, all its debates, arguments, schisms and revolutions; they are all referenced even in arguments that want to delete the whole idea of the subject.
The reason most of the unfound stuff hasn't been found is that not enough people think it matters. Maybe it does, maybe not, we will only find out as time goes by. But the illusion that, if only we could have access to everything ever written, we would be better informed is, and will remain, an illusion. 100% of everything we know is almost nothing. If it is true that the total of human knowledge doubles every ten years or so then everything I knew 10 years ago is theoretically out of date, but it doesn't matter, because everything I can know now is still built on it.
The net is the real-world analogue of Borges Library of Babel in which he says, "The Library is a sphere whose exact center is any one of its hexagons and whose circumference is inaccessible." Whenever we connect to the net we become the centre of an indeterminate sphere where every other user is both an end from our perspective and their own centre. But Borges library has another secret; every book in it is separated from millions of other books by only the merest whisker of a difference. Perhaps one letter, or a comma, is different from thousands of others. Imagine the Encyclopedia Britannica published in 20 million different version, each of which differs from all the others by only a single letter in one word somewhere. Which would we say was the "definitive" version? Would it matter? Welcome to the net.
Not Semantic but Holographic
If you cut a hologram in half you don't get half an image, you get a fuzzy one. Cut it in half again and you get an even fuzzier one. In other words, the whole of the image is encoded in every part of the piece of film. In other words it scales. I don't know how small the piece has to become before the fuzziness degenerates into a blob indistinguishable from all other holograms. The web also scales, until you know how much of it you are looking at, you can't tell how much of it you are looking at. More to the point, you can't learn anything from looking at the traffic or the nodes, all you will see is that lots of traffic is moving and some nodes handle more of it than others. (I'm also doing some thinking about Jonathan Schull's Macroscope Manifesto and I'll try to get something out on that too shortly)
A hologram is an interference pattern and the more interference you get, the fewer places a given part of the image can occupy, so the more precise and well defined it becomes. If you consider links as interference between information, then you will get what I mean. The more links among certain documents, the more clearly their relationship is defined, and within that, rating and ranking become more and more clear. Ten links tell me very little, a million links tell me a lot more, ten million links and I start to get some confidence that the destinations floating to the surface are important. I don't need to know what the content is beyond a few keywords that they share, all I need to know is that people with an interest in these things have reached something approaching a consensus that this destination is important. That however, is metadata that has emerged from the individual acts of millions of web publishers, each of whom is making their own, largely independent evaluations and pushing some destinations inexorably upwards while demoting others to the periphery. The result is meaning that does not inhere in any one node or any amount of traffic, it reads instead the relationships among the documents and projects the results on the results page. The more interference a document has with other documents on the net, the more clearly its place in the web is defined, documents that don't interfere much remain fuzzy, unfocused in the results and as part of the background noise. But the image is not contained in the document, nor is it contained in the links, it is revealed when Google shines the light of its algorithms through that dense pattern of interference, just like a hologram.
Now, there are those who will say "Yes, but maybe some of those fuzzy documents are terribly important. Which may be true. Some of Archimedes later works were discovered buried as a palimpsest in a Byzantine prayerbook and finally recovered after huge effort in forensic imaging and yes, if we had known all along what he knew at his death, perhaps we would have reached the moon 100 years ago, we might also have nuked ourselves to bits shortly afterward. Same applies to Newton's work on optics that he forgot to mention for 30 years, but wishing it were otherwise is nuts and wasting our time looking for a magic bullet in some obscure document is also pretty dubious.
I think that what we have in the net is a hologram of the sum of human knowledge, but at this stage, while we can make out some general propositions and some areas are more comprehensible than others, the lens through which we see it is still pretty rudimentary.
As we get better at this, and as we get more useful tools that enable better annotation, ranking and rating and stuff like Technorati that lets us assemble our own cosmos of trusted people then let that intersect with the information, more of it will come into focus.
BTW, I also have a bee in the bonnet about annotation, ranking and rating and what I call horizons. Stuff on that coming up.

This work is licensed under a
Creative Commons License.

I am working on the semantic web and holography in relation to H.T. Goranson's article on soft logic, Sir Thomas Harriot and C.S. Peirce (thirdness/context). I'm using Cortazar's two novels as examples of optimum interactivity (Hopscotch) and sociohistoric context (Manual for Manuel).
Posted by: D. Emily Hicks | February 19, 2005 at 08:12 PM