RDFa Trials and Travails

So I’ve spent the last few days trying to get my head around RDFa, and I gotta say, it’s not easy.  Excessive complexity is a charge often levelled against RDF and it’s something that RDFa was meant to mitgate.  Has it been successful in that?  I would say partially, yes.

RDFa is pretty easy to understand when you’re looking at marked up content.  Here’s an example of a vcard done up RDFa-style from w3.org:

<p class="contactinfo" about="http://example.org/staff/jo"
   xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#">
  <span property="contact:fn">
    Jo Smith
  </span>.
  <span property="contact:title">
    Web hacker
  </span>
  at
  <a rel="contact:org" href="http://example.org">
    Example.org
  </a>.
  You can contact me
  <a rel="contact:email" href="mailto:jo@example.org">
    via email
  </a>.
</p>

Ok, that’s clear enough.  The namespace thing might be a little tricky until it’s explained but even without getting that part… “contact:fn” ok probably full name, “contact:title” job title, “contact:org” organisation… you get the idea.

Manu Sporny put together a RDFa video tutorial which is great introduction to RDFa.  It introduces triples and the basics of using vocabularies which really are the foundation of RDFa.

Ok, cool.  I start looking at ccREL and I’m seeing all kinds of connections to what we’re doing.  “Why don’t we just use ccREL?”  I’m thinking.  They’re missing a few things we want to include, namely terms to describe the attribution trail so I start digging around Dublin Core and end up at the DCMI Metadata Terms.  Ok, why didn’t Creative Commons use this?  There’s lots we can use in the DCMI Metadata Terms too but I’m still not seeing anything for maintaining attribution trails in the way we’ve discussed but at least we’ve found a Dublin Core Vocabulary that meets 90% of our needs.

So onto write a RDFa vocabulary document to define the terms we need.  Oh look! w3.org even has a chapter titled “3.1 Creating a Custom Vocabulary and Using Compact URIs“.  Here’s what they say about creating a custom vocabulary:

Some structured-data concepts, such as dc:title, dc:date, etc. can be clearly reused from the Dublin Core vocabulary, but other concepts, such as lens settings, camera model, and other photographer parameters, may need to be defined from scratch. For this purpose, Shutr defines a vocabulary namespace URI:

http://shutr.net/vocab/1.0/

Shutr can then publish terms such as http://shutr.net/vocab/1.0/takenWithCamera, http://shutr.net/vocab/1.0/aperture, etc.

Sorry, how exactly does that work?  No links, no examples, and hardly even the slightest hint of how to accomplish what’s in the bleeding chapter title!  This is where RDFa fails to mitigate the complexity of RDF.  Or perhaps they’re just failing to provide documentation.  Or perhaps I’m just failing to find it.  To boot, this chapter has been removed from the latest version of that document.  Now creating vocabularies isn’t mentioned at all.

I see this as a big problem for adoption.  To describe the RDFa documentation as daunting is to be generous.  Here’s the document describing the RDFa syntax.  Betcha haven’t seen a scrollbar handle that short for a while.

Now I don’t mean to be overly-critical here.  That document is a monumental achievement and is a fantastic resource serving an important function.  But RDFa needs something aimed at helping people who wish to use the tech without necessarily becoming RDFa gurus.

We came across an email thread at lists.whatwg.org where the possibility of including RDFa in HTML5 is discussed.  It essentially boils down to:

RDFa: include our attributes please!
HTML5 (mainly Ian Hickson): use the existing tags.
RDFa: but we have namespaces and our structured markup will solve all kinds of problems.
HTML5: users won’t implement it correctly anyhow, look at the mess that metadata is right now.

And rinse, and repeat.

Part of me has to agree with Ian Hickson, especially in light of the difficulty I’ve had digesting this format.  Most of the time when I need to learn a new language I’ll hack around with it trying various things.  When I’m doing something wrong I get errors and I can refine my process until I get no errors.  Then I know that at least I’m getting the syntax right and I can build from there.  With RDFa I don’t have that luxury.

I used w3’s validation tool to make sure my markup was well-formed and an RDFa parsor/extractor from http://arc.semsol.org/ to see if RDFa was giving me any love.  This is one parser among many (but not that many) that each give a slightly different result when fed the same markup.

Ideally there is an official, litmus test, if it’s green here it’s green everywhere validation suite that will (at the least):

  1. Validate the vocabulary documents and turn them into a human-readable version of what each term means.
  2. Validate and parse XHTML/RDFa and extract the RDFa tags so you can see if what you’ve written expresses what you mean.

Without these tools we will see no end of broken and/or poorly implemented RDFa tags.

I have managed to put together the beginnings of a very simple RDFa format that supports the attribution trail we’ve been discussing.  I’ll post that in a subsequent entry.  I’ll be at Vancouver’s BarCamp this weekend discussing this format and the possibilities surrounding it.  Hope to see some of you there.

Posted in: General Discussion, Microformat and Mark-up, RDFa Permalink / Post a comment / leave a trackback.

Viewing 7 Comments

    • ^
    • v
    Very cool -- looking forward to talking about this. I have a personal love/hate relationship with microformats that lean mainly toward hate, and I've always suspected that RDFa land had some of the answers. Sounds like we need to bootstrap some example vocabularies?
    • ^
    • v
    Rob,

    Fair criticism re vocabularies. In my experience it is about 80% reuse of existing vocabularies (or ontologies, if you need to impress you boss;) and only in few cases you need to invent terms on your own.

    Please have a look at the following URIs and let me know in case you need more:

    + http://semanticweb.org/wiki/Onotlogies
    + http://esw.w3.org/topic/VocabularyMarket
    + http://www.schemaweb.info/ (not actively maintained)

    Btw, a good way to develop vocabularies and or exchange thoughts would be a VoCamp (http://vocamp.org/wiki/Main_Page).

    Cheers,
    Michael
    • ^
    • v
    Thanks for those resources Michael, good links.

    The vocabulary (or excuse me, ontology ;) that I've put together only adds a few terms so 80/20 theory holds true in our case. These added terms all revolve around describing different kinds of attribution; source, yes, but what nature of source? A copy? Derived work? Inspiration?

    A shame to have just missed VoCampOxford. That would have been a wonderful trip.

    Cheers,

    Rob
    • ^
    • v
    Hi Rob,

    You do make a couple of fair criticisms:

    * The RDFa Syntax Document is complex.
    * We don't go into detail about creating vocabularies.
    * Tools for performing vocabulary document validation are non-existent.

    I just wanted to point out that the RDFa Syntax Document isn't meant for everyday web authors - it is meant for people that must have a specification for developing RDFa parsers. If all you want to do is re-use RDFa and you don't want to learn about how it works, the RDFa Syntax Document is not for you.

    There is a community site meant to help people learn and understand RDFa, this is where we hope to put the more accessible content about RDFa:

    http://rdfa.info/wiki

    There are a couple of criticisms that you make, that are not accurate:

    * Creative Commons should have re-used Dublin Core.
    * Good RDFa validators/extractors do not exist.
    * There is no way to currently do attribution trails.
    * There needs to be a RDF to human-readable vocabulary conversion tool

    There is a very good whitepaper that explains why CCrel was created:

    http://cms.communia-project.eu/node/79

    In a nutshell, they couldn't just re-use Dublin Core because it didn't have the properties that they needed. Take a look at their vocabulary (which is both machine readable and human readable):

    http://creativecommons.org/ns#

    Additionally, Creative Commons extended Dublin Core with their own vocabulary terms. This is a major feature in RDFa and the point shouldn't be lost. Creative Commons didn't have to talk to anybody to extend Dublin Core - innovation can happen in a distributed fashion instead of through a centralized standards process. Authoring web vocabularies is something that we haven't focused on yet, but will be detailed on the rdfa.info/wiki site. We had to focus on creating RDFa - the rest will come in time.

    There are a couple of RDFa extractors and display programs for Firefox - Operator and Fuzzbot. Support for RDFa will get better in time. In the mean time, you can see a video of Fuzzbot in action here:

    http://www.youtube.com/watch?v=oPWNgZ4peuI

    You can create the attribution trails that you speak of by using dcterms:source from the Dublin Core Terms vocabulary. I believe it supports your use case:

    http://dublincore.org/documents/dcmi-terms/#ter...

    Lastly, there does not need to be an RDFa vocabulary validation/display tool for people now that there is RDFa. Note that the Media, Audio, Video, Commerce, and CCrel vocabularies are all marked up using RDFa. This makes them RDF vocabularies that are human and machine readable:

    http://purl.org/media/
    http://purl.org/media/audio
    http://purl.org/media/video
    http://purl.org/commerce
    http://creativecommons.org/ns#

    Hope this helps - the learning curve for RDFa is steeper than we want it to be right now because there aren't many good tutorials out there. The concepts can be distilled fairly simply but we just haven't gotten much together just yet - we've been busy with creating RDFa. The next year or two will be focused on creating tools for RDFa and teaching it to web authors.

    Thanks for your interested in RDFa and blogging about it - I hope some of the information above was helpful.
    • ^
    • v
    Hi Rob,

    I just noticed that your link to the RDFa Primer is to a 1-year-old draft. Have you checked out the latest version:

    http://www.w3.org/TR/xhtml-rdfa-primer/

    It should be much simpler and easier to understand.

    One important issue it does NOT address is the creation of new vocabularies, in part because that's a fairly advanced topic. That said, your feedback on the recent RDFa Primer would be super helpful.
    • ^
    • v
    Hi Ben,

    Yes, I noticed the newer version, and that vocabularies are no longer mentioned in it. I whole-heartedly agree that vocabularies are an advanced topic and the primer might not be the best place to describe them, but I still think they need an entry point that's a little more accessible than what currently exists.

    What I'd *love* to see is something along the lines of what I mention at the end of this post - a set of "gold standard" tools (validators and parsers) that developers can use to let them know that they're on the right track without needing to grok the entire RDFa specification.

    I like the primer overall. I think it does a good job of showing the reason for RDFa and then gives just enough information to get a vague idea of the practice. You've no doubt come across this video tutorial which contains about the same level of information:
    http://www.youtube.com/watch?v=ldl0m-5zLz4

    But it's easy to digest... Oohh, flashing lights! ;)

    I don't think we should underestimate the issue of accessiblity. I see RDFa as reaching out to the "normal" world from the somewhat-airy heights of RDF. It needs to be a welcoming handshake to bring people on board.

    Cheers.
    • ^
    • v
    Like this blog a lot ... thanks for sharing and giving such high quality links for RDFa.

Trackbacks

close Reblog this comment
blog comments powered by Disqus
PlayTheWeb.org is an ad hoc group of Web professionals who are interested in promoting the idea of "Web Play" through the ethical reuse of content on the Web. We want to report, discuss, and promote Technologies, Techniques, Applications, and Business models that move this idea forward.