I’ve just been cruising through my Twitter account, and discovered a link on why we should “think data”. It turned out to lead to a blog post from Paul Stainthorp over in L&LR, which he discovered from Joss Winn down in CERD, which is originally from Mike Ellis, who presented it at the JISC Digital Content Conference.
Now I’m over with the link stuffing, onto what I think of the presentation. To put it bluntly, the presentation is absolutely right. I wish it had been around when I was at the away day with IT, because it summarises perfectly why the University needs to get its data out of the individual services (often tied firmly into a web interface) and into an open, machine readable format which any service, system or person can access as necessary.
In a true academic style, I went hunting for papers to back me up on the claim that separation is good. I found one from IEEE – Separation of Data and Presentation for the Next Generation Internet Using the Four-Tier Architecture - which the abstract seems to describe as being exactly what I want. Obviously I wanted to read more and get some good citations, but this is where I ran into something which backs my claim up even better.
I can’t get to the article.
The University has a subscription to the IEEE journals, accessible through Athens. e-Library (Portal login required) tells me that I can access it. As soon as I click the link, however, I am immediately taken away from the University site, forced through Athens, then taken back through a University proxy server which frames the IEEE site and doesn’t work past the search page.
Surely, the data should be firmly separated from the IEEE website and made accessible to the University’s library systems in a way which allows it to be seamlessly accessed as though it was part of the Library’s own collection? My expectation would be that if I can access a library service then I should be able to search it and view it from within a single point. This isn’t an attempt at ’stealing’ the data for nefarious purposes, I have a legitimate purpose for accessing the data and the University have arranged for permission for me to do so. I would therefore expect the data to be available free from the carefully walled, controlled and heavily branded IEEE website and in a standardised way which the Library website (well, Horizon would probably explode at the concept) can understand and present to me.
So, back to the University. I have a prediction – if Timetabling can provide an open data feed of the status of rooms (basically, the times a room is occupied) then within a month some enterprising student will have built a ‘Find Me A Room’ service which Timetabling themselves would have taken 6 months to do at half the quality. If Estates can make information on the facilities each room has (projector, AV system, whiteboard etc) available then the ‘Find Me A Room’ service would even let you organise and sort it by criteria.
The ball, it seems, is in the court of the data providers.
You should propose this as a ‘good idea’ under the current HR campaign for suggestions and see if it’s taken up.
By: Joss Winn on August 6, 2009
at 6:50 am
No argument from me. And what goes for timetabling ought to go for the library and “”our”" bibliographic data, in whacking great spades.
Also, apologies that you’ve found the IEEE access less than simple. UoL’s access to their CS digital library material *isn’t* actually controlled by Athens – the only means we have of authenticating to their site is by supplying them with our IP addresses. Hence the need for the proxy server, which I’m sorry wasn’t working for you. I’ll take a look at it.
Now, if the professional bodies* for Computer Science aren’t making their data available in an easily-reusable way, what chance for Art & Design, or Psychology, or Agriculture?
“My expectation would be that if I can access a library service then I should be able to search it and view it from within a single point.”
That’s the goal. But we’re a long way off, and we need the publishers on board.
Did you get hold of the article, in the end?
*Yes, plural – the ACM’s site (https://portal.lincoln.ac.uk/C7/C6/ACM) is also locked down to IP.
By: Paul Stainthorp on August 6, 2009
at 8:57 am
Ahhh…. IEEE have revamped their site, and it’s broken the proxy. See what I miss, now that I’m not working with DCI anymore?!
By: Paul Stainthorp on August 6, 2009
at 9:01 am
Thanks to Paul who dredged up the article from on-campus and mailed it to me – I’ll take a proper look through it on the train today.
By: Nicholas Jackson on August 6, 2009
at 9:11 am
Hey Nick, thanks for the mention – appreciated.
At IWMW 2009, Tony Hirst and I did a session called Mashups Round the Edges in which we focused on ways of getting data out even when institutions don’t make it easy for you. I’ll be publishing our slides shortly at slideshare.net/dmje. Tony has already done a blog post on table scraping, here.
Obviously, the preference is for MRD, but there’s often a way without
By: Mike Ellis on August 6, 2009
at 12:24 pm