September 2002

Wordsworth London Bicentennial

Today is the 200th anniversary of Wordsworth’s ode to London (luckily he could not afford to ride inside the coach, so got a view worth writing about...).

It is also the 101st anniversary of the raising of the flag of the then shiny-new Australian commonwealth (3 September 1901). As it happens, when it came to making up quizzes for some testing software at work, I whimsically chose the Australian flag as a topic, which meant I ended up drawing my own. The one up at the top right there is the 1901 edition; it is largely similar to the current National flag—can you spot the differences? (See also my entry for 31 May.)

Two silly suggestions for RSS 2.0

I’ve been thinking about the various RSS branches and the RSS-2.0 process and send this to Mark Pilgrim and Dave Winer. Since then I have been sort out a draft recommendation. It’s tricker than I thought...

Version numbers in namespaces considered harmful

The various flavours of RSS offer a variety of namespace requirements:

URL RSS version
http://my.netscape.com/rdf/simple/0.9/ 0.9
(default) 0.91, 0.92, 0.94
http://purl.org/rss/1.0/
http://purl.org/rss/1.0/modules/rss091#
1.0
http://backend.userland.com/rss2 2.0

In my opinion, it is a grave mistake to include a version number in a namespace URI. The function of a namespace is to prevent accidental collisions between names defined by different people (or organizations) when two XML vocabularies are combined in one document. The version number of the format can be specified separately (as indeed all the RSS versions do, as an attribute of their root element). If the 0.9 spec had only used http://netscape.com/1999/rss as its namespace (following the lead of http://www.w3.org/1999/xhtml) then all the versions could have used the same namespace.

Why do I care, you ask? Because if you use actual XML tools like XSLT to manipulate RSS feeds, then the fact that there are three or four namespaces in use for essentially the same elements makes the whole thing more complicated. Where I might have had

<xsl:template match="/rss:rss/rss:channel">...

I better be prepared for more complicated expressions like

<xsl:template match="/*/*[lname()=channel]">...

(to allow for any root element, and to ignore the namespace of the channel element). This clutters the XSLT file and makes it harder to maintain—and probably also less efficient. Sigh.

This is not unique to RSS, by the way—I had all sorts of hassle with early ve4sions of SVG tools which were caught out by the ever-changing SVG namespace URL. They finally settled on the quite-sane http://www.w3.org/2000/svg.

Topic Maps

I’ve been reading about Topic Maps, an ISO standard that has been augmented with an XML representation.

Topic maps are very similar to the RDF, in that they are all about graphs of topics (representing real-world subjects) connected with associations. The difference is that the Topic-maps paradigm seems easier to understand. Maybe its because they draw a distinction between the topics and the subjects they stand in for, whereas RDF tends to conflate the two. Or maybe its the way a few important relationships (like occurence and instanceOf) are treated specially in topic maps, which makes maps a little less bewilderingly generic.

Topic maps have a system of using URIs to stand in for particular abstract subjects. Separate topic maps using the same URI http://www.topicmaps.org/xtm/language.xtm#en as the subject indicator for the English language know they are referring to the same thing. When they are merged, the corresponding topics will be combined automatically. One of the activities of various topic-map committees is creating published subject indicators for various generically useful types of topic, in order to promote interoperability between topic maps.

Other (meta)data systems use URIs to represent subjects: RDF does (using a weird convention where XML element-names turn in to URIs), RSS 0.9x/2.0 does (inasmuch as category names may be interpreted relative to a domain specified by a URL). It would be kind of cool if we could all agree to use the same subject identifiers, so our various efforts interoperate as much as possible.

Time to add a second Ethernet card?

My network at home uses what is now old-fashioned coax with BNC connectors. New computers—such as, for example, a visiting iBooks—only have connectors for new-style cat-5e cables. Rather than replace my entire home network, I’m experimenting with a second NIC and a crossover cable. This means powering down my desktop (uptime: up 170 days, 15:55).

Update: some hours later I have managed to get my first card working again, after Red Hat’s auto-configuration clobbered my settings! My attempts to load the module for card #2 so far unsuccessful. I have downloaded a newer kernel (I was running 2.2.12, this is 2.2.22), which is currently compiling. Maybe it will do better. ¶ It has now finished after 15½ minutes. I will not try rebooting until tomorrow, however.

Update (Sunday): Turns out that kernel 2.2.22 hangs on boot-up. Argh. Returned to 2.2.12 for now. On the otherhand, I discovered that make modules modules_install is different from merely make modules_install. It appears that the install target does not imply actually building the modules? Anyway, I did this and I now have an rtl2039.o that can be loaded with modprobe. Alas! trying to bring up the interface complains of a ‘Resource temporarily unavailable’, which I take to mean that the card needs to have an IRQ assigned to it, something that may be doable through the BIOS? Something to try later in the week. I have other things to do today.

Thinking about Topic Maps and dates

I had an idle thought about using topic-maps’ processing model (or even topic maps themselves) to represent the information encoded in RSS and RSS data resources. The attraction is that topic maps have a concept of merging built in, so writing an aggregator would in principle be straightforward.

Obviously stories are topics, and categories are topics. What about dates? These become what in topic-maps are called occurences (the concept of occurrence is stretched a little). I assume any topic-map query-engine is willing to do grouping and sorting on topics according to occurrence values?

This got me thinking about dates in metadata. Most metadata examples I’ve seen use what might be called free-form dates. This is perhaps OK within one, isolated database, but when I am merging two topic-maps, how does my computer know how to compare dates in random formats like 13 Oct 02 and 2002-09-22? I would rather not rely on the cute guessing games that programs like Microsoft® Excel resort to (e.g., if I enter 12-09-2002 and 13-09-2002 in separate cells, they end up holding 9 December and 13 September).

My suggestion is that the occurrence-types that subclass special topics that are conventionally used for dates in particular formats. These special topics in turn would require published subject indicators. I have created a page that contains PSIs for Date-Time occurrence types to illustrate the idea. Note! this is just for discussion. Also, it really needs an attached machine-processable metadata resource (in XTM, say).

Another slightly weird approach would be to reify the dates. That is, create topics representing the dates themselves. The relationship between story and the date it is published on then is represented as an association between the story topic and the date topic. Date topics would use PSIs with a format like http://psi.example.com/2002/date/#2002-09-22 or http://psi.example.com/2002/date/?d=2002-09-22 (the latter has the advantage of being able to run a check on the format of its param). Probably less efficient than using occurences.

The future is now!

Today and for the next three weeks we are showing the last four episodes of Black Butterflies. Because we were away in Canada for a fortnight, we are posting some of the episodes early. If you want to maintain the weekly suspense between episodes, please simply refrain from following the links until the date written underneath it. Sorry for any confusion.