John Philip McCrae

Lexical Linked Data Case Study: ALPINO Treebank - Part 2

Following on from the previous post, we will now create a SPARQL endpoint so that we can query the contents of the data. To do this we will use the light-weight engine 4store. The first task is to set up the task, on an Ubuntu based machine this is simply achieved with

sudo apt-get install 4store

Otherwise it may be necessary to install it following the instructions.

Once 4store is installed we simply create a database, set up the back-end and load all data

4s-backend-setup alpino
4s-backend alpino
for file in `find . -name \*.rdf` 
do fileBase=`echo $file | sed 's/\\.\/\(.*\)\..*/\1/' ` 
    4s-import alpino -v -a -m "$fileBase" $file 

Note, as the RDF files made by the XSLT do not specify the URI we must be careful when loading the data that 4store uses the right URIs.

Next we set-up the web connector at a random (firewalled) port

4s-httpd alpino -p 8888

Now we need to make it available to the web, we will do this through a PHP script, as the default HTTP interface for 4store is not particularly user friendly

I wrote the following PHP script for this:

if(!isset($_REQUEST["query"])) { ?>
 <title>ALPINO corpus query</title>
 <form action="" method="get">
 <label for="query">Query:</label><br/>
 <textarea name="query" rows="5" cols="80">
PREFIX cat: <> 
PREFIX rdf: <>
SELECT * WHERE { ?s ?p ?o } LIMIT 10
 <input type="submit"/>
<? } else {
$ch = curl_init();
$url = "http://localhost:8888/sparql/?query=" . urlencode($_REQUEST["query"]);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
$code = curl_getinfo($ch,CURLINFO_HTTP_CODE);
if($code == 200) {
 header("Content-type: application/sparql-results+xml");
 echo $data;
} else {
 echo $data;

Now the final step is to register the resource with CKAN. To do this we simply go to the website, create a user account and fill in the form thus:

In particular we added the following URLs

Finally we send a mail to the open linguistics list to announce the Open Linguistics Working Group.