The JLeRN Experiment

JISC's Learning Registry Node Experiment at Mimas

Archive for the category “Technical achievements”

Understanding and using the Learning Registry: Pgogy tools for searching and submitting

I started with the Learning Registry as part of Plugfest 1, which was just over a year ago in Washington D.C. (see my CETIS blog post reporting on it here).

Pgogy logo

Stuff about Pat Lockley’s tools noted here, plus other thoughts and projects of his, are available on his Pgogy website

Part of the thing I think people don’t get with the Learning Registry, is that as the Internet became the Web, then the Web became a series of distinction destinations – Facebook, Google, Twitter, etc. So although the Learning Registry exists – it doesn’t have a front page, or a “Tweet this” button, but it exists like HTTP exists – you can build using the Learning Registry, but you might never see it.

So that is poorly explained, but that is an innate part of the problem – and it’s a problem I sought to rectify at the first Plugfest. If I can’t take people to the Learning Registry, then I should take the Learning Registry to people. How? …

A Chrome tool: the Learning Registry enhancing Google searching

Google is the biggest store of links in the world (I shy away from the ‘R’ word), and so it is where people go to search. Some of the links returned via a Google search will also be links stored in a Learning Registry node – so you can, via one of my Learning Registry tools, check to see if it knows anything about a website.

So you can do a Google search, click on a button, and really quickly (the Learning Registry technology is state of the art) you can see which pages the Learning Registry knows about. Sometimes this knowledge will be just keywords, authors and descriptions, but could also be educational levels and the type of interactivity supported by the page.

You can download the Chrome plugin here – and watch a short demo video below showing you how it works.

Note on trying this from Sarah Currier: Once you’ve installed this in Chrome, you can try it out by running any search in Jorum - nearly all of Jorum’s OERs have metadata in the JLeRN node. So you should see lots of small crosses (+) to click on next to your search results. If you want to try it in Google search, try a search for something you know is in Jorum – here’s an idea in case you want a quick win: “SEA and good governance” governmentality - look for the little cross in your Google search results against a Jorum DSpace result, and note that the same resource appears in the search results in other repositories but without the cross as they don’t publish to the JLeRN node. If they all published to the nide, along with usage data (paradata) ab out that resource, you’d be able to use one of the other tools to look at *all* the paradata for this resource, even when accessed in different places.

Paradata: information about how learning resources are used

As well as this information to describe webpages (in this case, metadata about learning resources), increasingly Learning Registry nodes are storing what is called paradata, which is information on how a resource is used. Imagine the same Google search as above, but this time you can see how popular resources are. So what would normally just be a page, now becomes a page used by 500 teachers in your subject area. Once a resource becomes used (and as long as someone tells the Learning Registry about it) other people can find this data and pick out the resources most suited to their needs.

So paradata, another great mystery to explain? Not really, it’s like seeing how often a book is cited, a link linked or a tweet retweeted. All that is different is the data on reuse is in a slightly different format, and is shared outside of the silos where it lives right now.

How can you share paradata for your resources? Well a lot of people use Google Analytics data, and a page visit (one that Google Analytics tracks) is paradata, it just needs some tweaks before it can become data in a Learning Registry node. How can you do this? …

Pliny: submit your Google Analytics data to the Learning Registry

You can use Pliny, another tool developed by me, to share your Google Analytics paradata. Pliny uses Oauth to sign you into Google Analytics. It then accesses your analytics data, and submits it to the Learning Registry for you (currently the tool at the above link submits to the JLeRN Alpha Node). It does all the hard work for you, all you need to do is click your mouse a few times.

You can watch the short demo video below to see how Pliny works:

Ramanathan: submit metadata from your RSS feed to the Learning Registry

So we’ve looked at the benefits of the Learning Registry for teachers in terms of finding appropriate resources; how can people contribute? Well, Pliny allows paradata to be submitted, and you could use another tool I developed – Ramanathan - to take an RSS feed and use it to submit the metadata in a feed into a Learning Registry node.

You can watch a short demo video below to see how Ramanathan works:

In the same way that there isn’t a place to go to search for teachers, how do cataloguers work with such a decentralised model? Both Pliny and Ramanathan help data to be submitted, but there isn’t as of yet an easy tool to remotely manage metadata on your resources in a Learning Registry node. If you use either of these tools, you will be given permanent links to your documents on the Learning Registry node – but this is only to see – not to delete or revise.

Learning Registry Browser: find Learning Registry paradata for a web page

I’ve developed a second browser plugin for Chrome – the Learning Registry Browser - which will tell you if the page you are on has data in the Learning Registry, show you the documents that have been submitted, and when they where submitted. Remember that as people might be using your resources, there may be documents not submitted by you; this is one of the benefits of the Learning Registry – to track others’ use of your resources outside of your own silo.

You can watch this brief demo video to see how it works (and see the note above in red, which will give you a sample search to try):

I appreciate this is lots of new things, but I hope that you think you’ll find these tools helpful and are encouraged and hopefully curious enough to consider submitting data to the Learning Registry.

JLeRN node upgraded

The node was previously running version 0.23.5 of the LR code but is now running version 0.23.7.  Version 0.23.6 was a major release and 0.23.7 was a minor release.

The release notes for version 0.23.6 state the following:

  • Data Services NEW
    • The extract API is included and enabled in this distribution. The Extract API brings a “batteries included” interface for tailoring data extraction and query to suit more narrow and varied use case needs without high resource requirements.
    • Data Service views are NOT installed by default. Node administrators can install any or all of the 3 standards alignment focused data services using python CouchApp. This is done to prevent unneeded consumption of disk space for data services not utilized.
    • Interactive documentation explaining Data Services and how to roll your own for your use case.
  • Resource Data Distribution refactoring CHANGED
    • Distribute has been significantly refactored to reduce the need for 2x storage for documents. New distribute uses 2 DB’s, incoming and resourcedata. New _tainted documents can be distributed to incoming, and node may impose an internal policy for untainting documents before moving into resource_data. All harvest services still operate against resource_data.
    • More compatible with Learning Registry 0.23 specification. Document Types of resource_data are distributed, not resource_data_distributable.
    • Upgrade will require some minor configuration changes to NGINX to change or expose incoming endpoint.
    • IMPORTANT Legacy nodes can distribute to a 0.23.6 by adjusting service documents to use the incoming endpoint. Legacy cannot be the destination for a 0.23.6 server.
  • Subscription for Distribute NEW
    • You may now visit http://%5Bnode address]/register, enter the URL of your node (use https if SSL required to access).
  • Support for CouchDB 1.2.0 NEW
    • This is a highly recommended upgrade. Significantly improves the storage and resource utilization, as well paves path for features planned for future releases.
  • Many Bug fixes

I have not yet installed the Data Services views (pertaining to standards alignment) but if anybody wants them please tell me.

Next steps:
1. Test the new Extract API
2. Test distributing data between nodes

I’ll report back with the test results.

Node Explorer

We now have a prototype of a tiny app you can use to explore things on our node.  Our app is based on some work, called LR statistics, previously done by the LR developers in the U.S.  If you wanted to, you could clone the app and adapt it to explore any LR node.  The source code is available in our JLerN Github repository.

The app is here:
http://jlern.iriscouch.com/resource_data/_design/explorer/start.html

This version of the app runs on CouchDB in a cloud.  (It literally took seconds to create a CouchDB instance, hosted by Iris Couch.)  Once the prototype was ready, we replicated our resource data from the JLeRN Alpha node to the cloud and pushed the app there also.  The app is just another JSON document stored in CouchDB.

Tonight the document count is 15955.  This will not change if new documents are published to the JLeRN Alpha node.  Having said that, live updates could probably be implemented but this would require further testing.  [28 May: We have now implemented live updates.  With respect to IE 8 & 9, the app works as expected but the status information might say the node is down when it is actually up.  This has to do with your browser's security settings.  In your Internet Options, (1) allow "jlern.iriscouch.com" as a trusted site; (2) lower your security levels for that zone to Medium-Low; and (3) answer "yes" to the security question when you start the app.  Or contact us.]

We could further develop the Node Explorer in any number of ways.  Please do let us know what you think would be useful.

Dunking your Cake into a tasty source of data

APIs make it easier for developers to interact with web applications and systems. To understand the capabilities of the Learning Registry, and how Jorum could potentially use and benefit from the it, I decided to create a CakePHP Datasource for the Learning Registry API. My framework of choice is CakePHP. CakePHP is designed for rapid development, and is ideal for prototyping web applications.

So far, I’ve created the READ parts of the LR Datasource. The LR Datasource abstracts the connection requirements and the intricacies of the API and helps connect Cake developers directly to the LR, allowing them to simply “plug-in” to a node and use the data obtained directly in their developments.

Recently, I have been working on the Jorum Dashboard; a web application that will provide an up to date, accessible window to statistical data about Jorum resources. The dashboard is written in CakePHP and I have been able to plug the LR Datasource directly into the application, giving me access to LR nodes (including the JLeRN nodes).

With the READ capabilities of the Datasource I can access paradata about resources and display that information within the Jorum Dashboard.

In the future I’d like to develop the Dashboard so that it submits the DSpace stats about resources, collects further usage data about resources from other sources including Topsy, Twitter, Google, Klout, and posts this paradata to a Learning Registry node.

You can find this and other CakePHP Datasources that I am working on at the following Git repository: https://github.com/cookiescrumbs/Datasource

What next…?

We have been looking at options to further work related to the JLeRN node and paradata, basically to create a sample interface. Javascript will be used mainly for the front-end, though it is all in primary stages. With all the data we have currently in our JLeRN node, it would be good to have a way to explore it and use it through a web client. Currently it is not fully clear what we might end up with, but some basic wireframes for the web pages are considered. Next we will look at the sharing and synchronising of learning resources across the Alpha node on Ubuntu OS and the Beta node on Windows 2008 OS.

Keep watching this space on more progress.

The Windows Node at Mimas

Worked to establish a node on the Windows Server 2008 machine and after some tweaking and self learning got through quite far installing the node.There were some issues on getting Nginx setup right though.

Then after some tweaking, Nginx goes smoothly on the Windows 2008 server after restarting the machine, but wasn’t sure why the step to push the couchapps not working. I was trying to navigate to the config folder of the LR repository
cd C:\Python27\LearningRegRepository\config
and ran the command

python setup_node.py

and was getting the following error:

Traceback (most recent call last):
File “setup_node.py”, line 12, in
import couchdb
ImportError: No module named couchdb

Checked the python code, and couchdb folder is very much existing.. well I knew now is the time to “DEBUG” which I guess I love the most. Lou on the US Learning Registry team suggested to download version 1.1.1 from https://github.com/dch/couchdb/downloads

Damon Regan from the US Learning Registry team noticed the blog post and was happy about the work on Windows machine. He mentioned that Lou Wolford on the LR team has worked with getting the LR running on Windows. There is an active pickling error that they’re working to resolve. He shared the discussion thread on that: https://github.com/LearningRegistry/LearningRegistry/pull/167 and discussions were happening on the Windows mods during the dev calls. He hoped that all issues get resolved and I can get it up and running soon.

Later on I managed to resolve as per Jim’s suggestion for Ubuntu on https://groups.google.com/forum/#!msg/learningreg-dev/0sKsLb15fi8/hFFhObPk69IJ


cd ~/gitrepos/LearningRegistry/LR
pip install -e ./

That should install ALL LR dependencies outside of wsgi, then:

cd ~/gitrepos/config
python setup_node.py -d

I used the same version of couchDB as Lou suggested. The pickling error was still persisting and I was going through the discussion thread Damon mentioned in his comment.

Lou commented that they are looking into a solution that looks for what OS is running and switches between threads/processes to keep everything stable no matter what OS is being used. He promised to keep me updated and let me know ASAP when it is released.  Damon mentioned that John Poyau from the LR team is working on a new factory model to allow windows platforms to use threads and linux and mac platforms to use process. Then in few days, John was able to fix the issue by switching to use Thread instead of Process in change_monitor to fix the pickling_error in Windows. Also he tweaked the way test files are declared to be Windows-friendly.

Damon mentioned that the fix is tested on Windows, Mac and linux. He also mentioned that the fix is not formally released yet. but is in their stable master. He recommended to update our experimental windows node at Mimas from the stable master to obtain the fix. He suggested following steps as well to upgrade the node to master


1. Pull the most recent tag from git

cd /LearningRegistry
git checkout master
git pull

2. Run the setup node python script

python setup_node.py -d

Then he mentioned that it is important to re-run the setup node python script as configuration changes take place during updates. The LR team hopes to have an update script soon that will preserve our node settings on update.

Now I have followed Damon’s steps and do not see the pickling error anymore, and able to start the server on localhost. Watch this space for further development on the Windows environment.

Jorum OAI-PMH data published

The starting point for this experiment was a document written by the LR developers in the U.S., the OAI-PMH to Learning Registry Publish Utility.  I installed the Python OAI-PMH third-party module and started working with the Python script provided by the LR developers, LR-harvest-and-publish.py.  The script needed some code to handle basic authentication with our JLeRN alpha node.  Then I successfully published some sample Jorum data but the JSON documents weren’t quite right.  The resource_locators (URIs) were incorrect and the keys (subject terms) were missing altogether.  In the case of Jorum, the keys correspond with Dublin Core subject terms in the OAI-PMH data.  I installed the Python ElementTree third-party module in order to parse the OAI-PMH data.  Then I located the subject terms and inserted them into the JSON document before publishing; that is, the LR application indexes the keys when the JSON document is published.  Then I managed to harvest all the Jorum OAI-PMH data and published it on our node.  (We have put our code in a repository on Github in case others can benefit from anything we happen to write.)

Example queries:
http://alpha.mimas.ac.uk/obtain?request_ID=http://dspace.jorum.ac.uk/xmlui/handle/123456789/1000
http://alpha.mimas.ac.uk/slice?identity=jorum
http://alpha.mimas.ac.uk/slice?identity=mimas
http://alpha.mimas.ac.uk/slice?any_tags=curve
http://alpha.mimas.ac.uk/slice?any_tags=magnetism

Alpha node

We installed the JLeRN “alpha” node on our new Ubuntu server on the weekend (Jan 21-22) before the Hackday.  But there was a bug which affected the indexes and thus document retrieval.

The LR developers in the U.S. resolved the bug promptly and we applied the fix on Jan 28th.

The node is here: http://alpha.mimas.ac.uk

The node supports all publish and retrieval services although we haven’t had a chance to test all of these yet.

The node is now open to the world except you need a username and password to publish documents.  The credentials you need to publish are:

Username: fred
Password: flintstone

Give it a go when you have a minute or two.

Examples:
http://alpha.mimas.ac.uk/status
http://alpha.mimas.ac.uk/description
http://alpha.mimas.ac.uk/obtain  [all documents in node, 100 at a time]
http://alpha.mimas.ac.uk/obtain?request_ID=http://dspace.jorum.ac.uk/xmlui/handle/123456789/1000
http://alpha.mimas.ac.uk/slice?any_tags=curve
http://alpha.mimas.ac.uk/slice?any_tags=magnetism

N.B.: The examples above fetch JSON documents.  Install an extension to your favourite browser so you can display the documents in a readable format; e.g., JSONview extension for Firefox.

Curl examples:
$ curl  -v http://alpha.mimas.ac.uk/obtain?request_ID=”http://alpha.mimas.ac.uk/obtain?request_ID=http://dspace.jorum.ac.uk/xmlui/handle/123456789/1000″
$ curl -Xv POST -H “Content-Type:application/json” “http://alpha.mimas.ac.uk/publish” -d @test_data.json -u fred

The Learning Registry Quick Reference Guide has more examples.

Java on Learning Registry…

The JLeRN Hackday was such an involving and learning event. Got few more doubts clear with Nick and got a chance to speak with Scott regarding his work on LRJavaLib – the Java library for the Learning Registry services. I downloaded bencode from http://bit.ly/AecnXF, untar it, and installed the jar file using Maven.

Now am trying to develop some Java code to automate all the services using JAX-RS using Eclipse IDE.

Also need to look at the Data Services proposal, which seems quite promising from the Developer’s list on http://bit.ly/wScfDk

Post Navigation

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: