Technology Review - Published By MIT
Advertisement

Wikipedia to Add Meaning to Its Pages

The online encyclopedia is exploring ways to embrace the semantic Web.

By Tom Simonite

Wednesday, July 07, 2010

smaller text tool iconmedium text tool iconlarger text tool icon

As a global resource built from the spare time of millions of volunteers, Wikipedia may be the epitome of Web 2.0. But the Wikimedia Foundation, a nonprofit organization that runs Wikipedia, among other projects, is now thinking about how to make it a linchpin of Web 3.0, or the semantic Web.

That means making some of the data on Wikipedia's 15 million (and counting) articles understandable to computers as well as humans. This would allow software to know, for example, that the numbers shown in one of the columns in this table listing U.S. presidents are dates. That could, in turn, allow applications that draw on Wikipedia to automatically generate historical timelines or answer the kind of general knowledge questions that would usually entail a person finding and reading a relevant entry on the site.

At the 2010 Semantic Technology conference in San Francisco last month, the foundation's deputy director, Erik Möller, and colleague Trevor Parscal, a user-experience developer for Wikimedia, showed some first steps taken by the foundation to explore how more semantic structure might be added to Wikipedia. They also appealed to the semantic Web community to help develop ways to make Wikipedia's knowledge more accessible to computers and software.

"Semantic information already exists in Wikipedia, and people are already building on it," says Möller. "Unfortunately, we're not really helping, and they have to use extensive processing to do so."

One example is DBPedia, a semantic database built using software collect data from the site's pages, and maintained by the Free University of Berlin and the University of Leipzig, both in Germany. Another is Freebase, a for-profit knowledge database, much of which was also sourced by scraping Wikipedia. Freebase is the data source used by question-answering search engine PowerSet, which was acquired by Microsoft to be part of its Bing search engine.

Story continues below

The first targets for Möller and Parscal are the "infoboxes" that appear as summaries on many Wikipedia pages, and the tables in entries, such as this one showing the gross national product of all the countries in the world.

"Just being able to reuse that data within Wikipedia would be a big thing," says Yaron Koren, who runs a consultancy that specializes in Semantic MediaWiki, an extension to the MediaWiki software used to build Wikipedia. "The manual work that goes into maintaining the many tables and lists today could be eliminated," he adds. Instead, lists could be automatically generated from the infoboxes of other pages. It would also be possible to generate maps, using the location coordinates that feature on some pages, or automatically generate timelines to summarize periods in history covered by many other pages, says Möller.

Comments

  • semantic web
    It is just me or does the semantic web seem be a automated 1984.  These are the relationships and that is the end of the conversation. Who gets to say what the relationship is and when the conversation is closed and open.  This is a giant step backward in the use of tech.  I used an open ended otology tool (which I wrote) to find this article.  Does someone have an argument that this is not the coming of 1984?
    Rate this comment: 12345

    dashriprock
    07/07/2010
    Posts:1
    Avg Rating:
    1/5
    • Re: semantic web
      The relationships are offered, not imposed by force. Calm down.
      Rate this comment: 12345

      StriatedPatt...
      07/07/2010
      Posts:7
      Avg Rating:
      4/5
    • Re: semantic web
      It's about an open and tool-supported _option_ to use community-developed ontologies/vocabularies.

      What comparison are you trying to make with 1984? Newspeak (i.e., terminology)? (obligatory) Viewscreens (i.e., monitoring)? (obligatory)

      Perhaps you need to put on your thinking cap a little longer...
      Rate this comment: 12345

      BarryNorton
      07/08/2010
      Posts:2
      Avg Rating:
      5/5
  • uberblic.org is already doing it live
    The article mentions DBpedia and freebase, but there is also a new player in the arena that looks very promising, namely: http://uberblic.org/

    Have a look at the demo video: uberblic is able to watch the IRC notifications of wikipedia edit events and update it's knowledge base accordingly:

      http://www.youtube.com/watch?v=JctenmbFevk
    Rate this comment: 12345

    ogrisel
    07/07/2010
    Posts:1
    Avg Rating:
    5/5
  • Reading between the lines
    Jimmy Wales, a couple of years ago, already summarily dismissed semantic web for Wikipedia.  I wonder if he'd change his mind?

    Regardless, I've set out building a Semantic Mediawiki site that now numbers over 60,000 pages and over 2,500 registered editors.  I think Jimmy Wales is a boob.

    http://www.MyWikiBiz.com
    Rate this comment: 12345

    thekohser
    07/07/2010
    Posts:2
    Avg Rating:
    4/5
    • Re: Reading between the lines
      The man who thought that paid editors were the only way to make an online encyclopedia? Thankfully he's shown willing to change his mind before...
      Rate this comment: 12345

      BarryNorton
      07/08/2010
      Posts:2
      Avg Rating:
      5/5
  • Genealogy using Semantic MediaWiki
    Familypedia, a genealogy wiki, has been using Semantic MediaWiki, Semantic Forms, and Semantic Drilldown since mid-2009. Currently 180,000 total pages and growing at several hundred per week with only half a dozen really active volunteers and dozens of casuals adding ancestors.

    When a user adds or edits a page about an individual, the data is in form fields, with the software creating or editing "facts" (a fact being a combination of a "property" and a value). A later query gets selected facts displayed in a chosen arrangement.

    http://familypedia.wikia.com/wiki/Help:Semantic_MediaWiki/demo_query-range shows an example of the result. It lists all articles of people born in the 1500s that have images, tabulated with columns for image name, birth date, death date, father, mother, and "children by first marriage". Each column can be clicked to sort in predefined order, such as birth date. The site has over 900 "properties", most of which could form a column in such a table.

    http://familypedia.wikia.com/wiki/Semantic_MediaWiki/demo_query-subquery lists all Familypedia people born in Pennsylvania whose father was stated to be born between 1849 and 1875. Handy if a surviving fragment of a letter from a great-aunt or great-uncle (without a name, just addressed to "My dear great-nephew") said "I was born in PA" and "Dad emigrated in 1875" and "his parents were married in 1849".

    Any user of the wiki can construct such a search based on a combination of known facts.

    When the site has millions of articles instead of just thousands, it will be even more useful if the software can handle the volumes. No big restrictions are foreseen. Such queries exploit the existing caching mechanisms of MediaWiki, so that most requests for a page with such dynamic contents can be served with little or no performance impact.
    Rate this comment: 12345

    RobinPatters...
    07/07/2010
    Posts:1
  • KIT project widely used
    I am really happy to see this news. Semantic MediaWiki was developed at the AIFB of the KIT Karlsruhe Institute of Technology by my colleagues and myself, and it is very satisfying that publically funded (by the EU) research results find application in the "real" world. Picking up the previous poster's comment, it is important to know that any MediaWiki can be turned semantic without too much work, and that although Wikipedia for us always is and was the ultimate goal, we are happy that so many other institutions -- MIT, Museum of Modern Art, NASA, BT, etc. -- and people are using it to their advantage. Yay!

    This week is Wikimania in Gdansk. I hope there will be a strong push in the direction this article is pointing to.

    Semantics to the people!
    Rate this comment: 12345

    vrandezo
    07/08/2010
    Posts:1
    Avg Rating:
    5/5
  • Microformats
    Very surprised there's no mention here of the - literally - millions of microformats already emitted by Wikipedia pages, making their data machine-readable.

    More at http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Microformats
    Rate this comment: 12345

    pigsonthewin...
    07/28/2010
    Posts:1

Videos

How to Make Robotic Hands

Log In

Forgot your password?     Register »
Advertisement
Advertisement
Subscribe to Technology Review's e-mail update. Enter your e-mail address

Advertisement
Advertisement
MIT Massachusetts Institute of Technology CyberMedia © 2010 Technology Review. All Rights Reserved.