Wikipedia Machine Tag Generator

This little web gizmo is a proof of concept for generating "machine tags" to describe any text content. I built it at Hack Day, a 24-hour programming fandango in London (I describe the experience here).

I do plan to explain, improve and build upon this app in the near future, but for now, a quick explainer:

About machine tags

Machine tags work like regular keyword tags but have a special and very precise meaning to software apps. The general format was devised by the mapping community as a way to tag content with geographical data (for example, these two tags describe the latitude and longitude of the Eiffel Tower: geo:lat=48.8582, geo:lon=2.29448).

The best and worst thing about tags are their flexibility and fluid meaning. Does content tagged "green" refer to the color, to clean energy or to the political party? Machine tags instead seek to assign very specific meanings that software can understand without ambiguity.

The cool thing about machine tags is that they can pack a lot of value and information into a single tag — value that can be used to construct interesting collections of unambiguously related info (like a location on a map, in our Eiffel Tower example, or photos of a specific airplane model using the "aero" machine tag format at Flickr).

A machine-tag format for Wikipedia

It occurred to me that Wikipedia could be the basis of an all-purpose machine tag format to describe just about anything in the universe. Wikipedia aspires to have topics on everything, so creating tags associated with Wikipedia topics could be a foolproof way to disambiguate content.

It's the Dewey Decimal System for the Web. Call it the Wiki Decimal System.

The format looks like this:

wikipedia: language = topic

For example, here's the URL of the English-language Wikipedia page on chainsaw sculpture, with the Wikipedia id highlighted:

http://en.wikipedia.org/wiki/Chainsaw_Sculptures

That means that the resulting machine tag for chainsaw sculpture is:

wikipedia:en=Chainsaw_Sculptures

Tagging any content with that tag offers a machine-readable way to specifically associate that content with the topic of "chainsaw sculpture." Assuming that the format is actually used, you could then easily jump straight from Wikipedia to a Flickr search of photos related to chainsaw sculpture.

The tag generator

Intrigued by the idea, I decided to build the Wikipedia Machine Tag Generator to create Wikipedia machine tags from any text content.

The tag generator finds all of the phrases that seem important in your text and displays each phrase as a library catalog card. Click the tab of the card to view it (and again to return it to the stack).

Each card contains three links to Wikipedia pages related to that phrase. If any of those pages are not important to your text, click the link’s delete button to remove it. If the entire phrase isn’t relevant, you can likewise remove the entire card from the stack. The list of corresponding machine tags is updated as you make these changes.

Want more tags? Just submit new text, and more cards are added to the stack.

Give it a try yourself: The Wikipedia Machine Tag Generator.

Feedback

Let me know your thoughts! Shoot me an email at jclark@globalmoxie.com, or comment on this blog post.