TechnoBlog
Some computer and technology related musings.

Subscribe to
Posts [Atom]
links open windows
Sunday, April 26, 2009
Parsing XMP with python & rdflib

I have a python script that creates a web gallery. Currently it uses data from IPTC info in the files. I've been wanting to add XMP support to it for some time but couldn't find a solution. The adobe sdk is in C++ and doesn't work with python. The exempi project re-wrote it in C for use with python but it's apparently tied to pthreads on *nix. Since I don't need to create XMP just read it I started looking at RDF parsers for python and found rdflib. To get the XMP data from the images I wrote a script using info from the adobe sdk regarding how it is stored in various file types.

First off XMP is a subset of RDF encoded in XML. RDF deals with data in triples ( subject, predicate, value) which form a graph. Lists and arrays are formed by triples with the same subject and one of the predicates is 'type' with a value of 'Alt','Bag','List',or 'Seq'. the other predicates are either empty or are '_1','_2' etc. which designates the order of entries.

The first problem I ran into was a lack of documentation on rdflib. The only tutorials / examples I could find dealt with retreiving a single value which doesn't help when most of the data I needed was Alt or Bag. The second problem was that the parser was choking on the data I gave it which contained XML other that RDF. It just occured to me that I should have tried load instead of parse. Anyway, here is the method I've used to retreive data:


def getData(graph, pred):
''' '''
try : x=graph.objects(None, pred).next()
except StopIteration : x=None
if not isinstance(x,BNode) : return x
ns=Namespace(RDF.RDFNS)
val=graph.value(subject=x,predicate=ns['type'])
if val in [RDF.Alt,RDF.Bag,RDF.Seq,RDF.List] :
return Seq(graph,x)
return None