The JLeRN Experiment

JISC's Learning Registry Node Experiment at Mimas

Author Archive

JLeRN node upgraded

The node was previously running version 0.23.5 of the LR code but is now running version 0.23.7.  Version 0.23.6 was a major release and 0.23.7 was a minor release.

The release notes for version 0.23.6 state the following:

  • Data Services NEW
    • The extract API is included and enabled in this distribution. The Extract API brings a “batteries included” interface for tailoring data extraction and query to suit more narrow and varied use case needs without high resource requirements.
    • Data Service views are NOT installed by default. Node administrators can install any or all of the 3 standards alignment focused data services using python CouchApp. This is done to prevent unneeded consumption of disk space for data services not utilized.
    • Interactive documentation explaining Data Services and how to roll your own for your use case.
  • Resource Data Distribution refactoring CHANGED
    • Distribute has been significantly refactored to reduce the need for 2x storage for documents. New distribute uses 2 DB’s, incoming and resourcedata. New _tainted documents can be distributed to incoming, and node may impose an internal policy for untainting documents before moving into resource_data. All harvest services still operate against resource_data.
    • More compatible with Learning Registry 0.23 specification. Document Types of resource_data are distributed, not resource_data_distributable.
    • Upgrade will require some minor configuration changes to NGINX to change or expose incoming endpoint.
    • IMPORTANT Legacy nodes can distribute to a 0.23.6 by adjusting service documents to use the incoming endpoint. Legacy cannot be the destination for a 0.23.6 server.
  • Subscription for Distribute NEW
    • You may now visit http://%5Bnode address]/register, enter the URL of your node (use https if SSL required to access).
  • Support for CouchDB 1.2.0 NEW
    • This is a highly recommended upgrade. Significantly improves the storage and resource utilization, as well paves path for features planned for future releases.
  • Many Bug fixes

I have not yet installed the Data Services views (pertaining to standards alignment) but if anybody wants them please tell me.

Next steps:
1. Test the new Extract API
2. Test distributing data between nodes

I’ll report back with the test results.

Node Explorer

We now have a prototype of a tiny app you can use to explore things on our node.  Our app is based on some work, called LR statistics, previously done by the LR developers in the U.S.  If you wanted to, you could clone the app and adapt it to explore any LR node.  The source code is available in our JLerN Github repository.

The app is here:
http://jlern.iriscouch.com/resource_data/_design/explorer/start.html

This version of the app runs on CouchDB in a cloud.  (It literally took seconds to create a CouchDB instance, hosted by Iris Couch.)  Once the prototype was ready, we replicated our resource data from the JLeRN Alpha node to the cloud and pushed the app there also.  The app is just another JSON document stored in CouchDB.

Tonight the document count is 15955.  This will not change if new documents are published to the JLeRN Alpha node.  Having said that, live updates could probably be implemented but this would require further testing.  [28 May: We have now implemented live updates.  With respect to IE 8 & 9, the app works as expected but the status information might say the node is down when it is actually up.  This has to do with your browser’s security settings.  In your Internet Options, (1) allow “jlern.iriscouch.com” as a trusted site; (2) lower your security levels for that zone to Medium-Low; and (3) answer “yes” to the security question when you start the app.  Or contact us.]

We could further develop the Node Explorer in any number of ways.  Please do let us know what you think would be useful.

Jorum OAI-PMH data published

The starting point for this experiment was a document written by the LR developers in the U.S., the OAI-PMH to Learning Registry Publish Utility.  I installed the Python OAI-PMH third-party module and started working with the Python script provided by the LR developers, LR-harvest-and-publish.py.  The script needed some code to handle basic authentication with our JLeRN alpha node.  Then I successfully published some sample Jorum data but the JSON documents weren’t quite right.  The resource_locators (URIs) were incorrect and the keys (subject terms) were missing altogether.  In the case of Jorum, the keys correspond with Dublin Core subject terms in the OAI-PMH data.  I installed the Python ElementTree third-party module in order to parse the OAI-PMH data.  Then I located the subject terms and inserted them into the JSON document before publishing; that is, the LR application indexes the keys when the JSON document is published.  Then I managed to harvest all the Jorum OAI-PMH data and published it on our node.  (We have put our code in a repository on Github in case others can benefit from anything we happen to write.)

Example queries:
http://alpha.mimas.ac.uk/obtain?request_ID=http://dspace.jorum.ac.uk/xmlui/handle/123456789/1000
http://alpha.mimas.ac.uk/slice?identity=jorum
http://alpha.mimas.ac.uk/slice?identity=mimas
http://alpha.mimas.ac.uk/slice?any_tags=curve
http://alpha.mimas.ac.uk/slice?any_tags=magnetism

Alpha node

We installed the JLeRN “alpha” node on our new Ubuntu server on the weekend (Jan 21-22) before the Hackday.  But there was a bug which affected the indexes and thus document retrieval.

The LR developers in the U.S. resolved the bug promptly and we applied the fix on Jan 28th.

The node is here: http://alpha.mimas.ac.uk

The node supports all publish and retrieval services although we haven’t had a chance to test all of these yet.

The node is now open to the world except you need a username and password to publish documents.  The credentials you need to publish are:

Username: fred
Password: flintstone

Give it a go when you have a minute or two.

Examples:
http://alpha.mimas.ac.uk/status
http://alpha.mimas.ac.uk/description
http://alpha.mimas.ac.uk/obtain  [all documents in node, 100 at a time]
http://alpha.mimas.ac.uk/obtain?request_ID=http://dspace.jorum.ac.uk/xmlui/handle/123456789/1000
http://alpha.mimas.ac.uk/slice?any_tags=curve
http://alpha.mimas.ac.uk/slice?any_tags=magnetism

N.B.: The examples above fetch JSON documents.  Install an extension to your favourite browser so you can display the documents in a readable format; e.g., JSONview extension for Firefox.

Curl examples:
$ curl  -v http://alpha.mimas.ac.uk/obtain?request_ID=”http://alpha.mimas.ac.uk/obtain?request_ID=http://dspace.jorum.ac.uk/xmlui/handle/123456789/1000″
$ curl -Xv POST -H “Content-Type:application/json” “http://alpha.mimas.ac.uk/publish” -d @test_data.json -u fred

The Learning Registry Quick Reference Guide has more examples.

Node of Mimas

Nick here.  I work at Mimas with Sarah and Bharti.  I installed a node in mid-December on a spare machine I had lying around.  As it was nothing “official” I called it Node of Mimas.  One simple way of looking at the node’s status is to execute a command like this:

$ curl http://<…>/status

{ “node_name”: “Node of Mimas”, “node_id”: “7ee34ddd8e4f4ea4aafd59c3c7619a16”, “active”: true, “timestamp”: “2012-01-11T20:32:01.097068Z”, “start_time”: “2011-12-31T10:01:07.698981Z”, “install_time”: “2011-12-14T14:44:24.206372Z”, “earliestDatestamp”: “2011-12-14T15:34:23”, “doc_count”: 1 }

You might notice the output from this command is a pretty JSON document.  This node was installed on December 14th and last restarted on New Year’s Eve.  It contains one document.  I wonder what it is?  Execute the following command to find out:

$curl http://<…>/obtain

{“documents”:[{“document”: [{“doc_type”: “resource_data”, “resource_locator”: “URI_of_resource”, “resource_data”: “Put_anything_like_metadata, xml_or_whatever_here”, “update_timestamp”: “2011-12-14T15:34:23.219869Z”, “keys”: [“science”, “what_ever_you_want”], “TOS”: {“submission_TOS”: “http://www.learningregistry.org/tos/cc0/v0-5/&#8221;}, “_rev”: “1-f81b1258b28092a314661527ad7dcbf0”, “resource_data_type”: “metadata”, “payload_placement”: “inline”, “payload_schema”: [“hashtags”, “describing”, “resource”, “format”], “node_timestamp”: “2011-12-14T15:34:23.219869Z”, “doc_version”: “0.23.0”, “create_timestamp”: “2011-12-14T15:34:23.219869Z”, “active”: true, “publishing_node”: “7ee34ddd8e4f4ea4aafd59c3c7619a16”, “_id”: “551592f1743d46a7b4f4d6c7484e356a”, “doc_ID”: “551592f1743d46a7b4f4d6c7484e356a”, “identity”: {“owner”: “”, “submitter”: “Your name or organization here”, “submitter_type”: “agent”, “signer”: “Your name or organization if signing the document”, “curator”: “”}}], “doc_ID”: “URI_of_resource”}]}

Let’s make the output more readable for humans (and edit it slightly):

JSON document

Sample document

Well, it’s just a test document, in JSON format, but it gives you an indication of what Learning Registry data looks like.

The Hackday in Manchester is fast approaching.  We are expecting to be given a new server by the end of this week but it will take a few days to install and test a new node we are calling Alpha.  By the middle of next week we will decide which node to use for the Hackday, either Node of Mimas or Alpha.  That’s all for now.

Post Navigation