METIS2OpenAIRE

This is a page for information on the OpenAIRE-funded METIS2OpenAIRE project. The project is running from the start of February until mid-May 2018. This page aims to collect posts on the project progress, on wider aspects related to CRIS interoperability and, once the project comes to an end, on the further advances in making CRIS systems beyond METIS OpenAIRE-compatible.

The following texts have been posted thus far:

     - Jun 3rd, 2019: "CRIS in OpenAIRE - we take you onboard", by Pablo de Castro, METIS2OpenAIRE coordination
     - Apr 30th, 2019: Data quality issues around CRIS harvesting by OpenAIRE, by Pablo de Castro, METIS2OpenAIRE coordination, and Aenne Löhden, OpenAIRE/Universität Bielefeld
     - Mar 1st, 2019: First two CRIS systems test harvested by OpenAIRE (beta service), by Pablo de Castro, METIS2OpenAIRE coordination
     - Nov 30th, 2018: CERIF-XML Guidelines for CRIS Managers start getting implemented, by Pablo de Castro, METIS2OpenAIRE coordination
     - Jul 12th, 2018: OpenAIRE compatibility for CRIS systems discussed at the CRIS2018 conference in Umeå, by Pablo de Castro, METIS2OpenAIRE coordination
     - May 25th, 2018: Test-driving the OpenAIRE CRIS Guidelines 1.1, by Jan Dvořák, technical lead for METIS2OpenAIRE
     - May 8th, 2018: Making the METIS CRIS at Radboud University OpenAIRE-compatible, by Ed Simons, METIS2OpenAIRE lead at Radboud University Nijmegen
     - Feb 26th, 2018: METIS2OpenAIRE: Adding CRIS systems to the list of OpenAIRE data providers, by Pablo de Castro, METIS2OpenAIRE coordination

 

 

Jun 3rd, 2019

"CRIS in OpenAIRE - we take you onboard"
Pablo de Castro, University of Strathclyde and euroCRIS
https://orcid.org/0000-0001-6300-1033

The Spring 2019 euroCRIS membership meeting held last week at CSC in Espoo/Helsinki saw significant progress in the area of CRIS harvesting by OpenAIRE. On Mon May 27th a Memorandum of Understanding between euroCRIS and OpenAIRE was announced "for the realisation and optimal functionality of the research information ecosystem". A 3-hr developers' workshop for CRIS managers interested in the implementation of the OpenAIRE CERIF-XML Guidelines for CRISs was held that same Monday. And the presentation "CRIS in OpenAIRE - we take you onboard" was delivered by Uni Bielefeld/OpenAIRE Andreas Czerniak and Aenne Löhden on Tue Mar 28th within the meeting main strand. 

Besides describing the CRIS validation process and its eventual advantages, this presentation provided an update on the progress around the test harvesting of CRIS systems. Five operational CRIS systems have already been validated against the CERIF-XML Guidelines, together with a sixth test platform (for Pure). These five operational CRISs are: 

While only the Danish test platform for Pure has been validated thus far, institutional instances of this CRIS will soon be available for testing too, since the version that will make it possible is already in the pipeline for institutional implementation. This, together with the expressions of interest raised by other platforms during the developers' workshop, is likely to result in the gradual increase in the number of validated CRISs.
 


The euroCRIS DRIS (Directory of Research Information Systems) is being expanded in order to make sure that every potential candidate for validation is included in the database. The mid-term goal would be for the OAI-PMH endpoint URLs to be added to the CRIS records in the DRIS so that some provenance identification may be associated to an automated CRIS metadata harvesting process by OpenAIRE.

 

 

Apr 30th, 2019

Data quality issues around CRIS harvesting by OpenAIRE
Pablo de Castro, University of Strathclyde and euroCRIS and Aenne Löhden, Universität Bielefeld and OpenAIRE
https://orcid.org/0000-0001-6300-1033

As part of the process for aggregating metadata feeds exposed by CRIS systems via their OAI-PMH endpoints, the OpenAIRE team at the University of Bielefeld are carrying out data quality tests on such feeds. Unsurprisingly, they are finding plenty of data quality issues in the metadata delivered by CRISs. These systems were not designed to expose institutional information to the outside world in the first place, and then, as opposite to institutional repositories, no guidelines for metadata formatting have ever been applied to them. The value of the contextual metadata that CRISs contain makes it worth taking them in once the data quality issues are tackled, and it's also a good move for such systems to try and improve their metadata formatting standards. As the METIS institutional CRIS lead at Radboud University Nijmegen and euroCRIS president Ed Simons put it some months ago, "The process for exposing the METIS feed via the OAI-PMH endpoint is actually providing a good opportunity to harmonise and complete the underlying data model structure in METIS".

For the time being, the metadata quality for CRISs is being checked on a case-specific basis for the first CRISs they're test harvesting. A pattern is quickly arising though that will allow the data quality tests to be carried out in a more structured way. These quality checks are using the CERIF-XML Guidelines for CRIS Managers as a basis, and they mean a first for CRIS systems that have never been tested this way before. Having the metadata feed from all OpenAIRE data providers – including CRIS systems – tested against standards like the COAR Controlled Vocabularies is definitely a good step towards their harmonisation.

METIS@Radboud is in fact the first CRIS whose OAI-PMH endpoint URL has been publicly made available (during Jan's presentation). Other additional systems will soon follow suit though, such as VIRTA in Finland, who also had a presentation in the event programme by Joonas Nikkanen (CSC) and Dragan Ivanovic, and Pure, whose involvement as presented by Anna Clements has so far extended to a mapping from its data model to the CERIF-XML Guidelines. The membership meeting hosts at the Warsaw University of Technology are also progressing with the exposure of the CRIS metadata feed for their own Omega-PSIR@WUT system.

The arising structure for CRIS metadata quality testing is shown below:

  1. Content-related metadata
         1.1. Resource types
         1.2. Subjects and keywords
         1.3. Language attribution
         1.4. Issues around specific item types (eg reviews)
  2. Provenance-related metadata
         2.1. Author identifiers
         2.2. Container publication identifiers
  3. Contextual metadata
         3.1. Date and version
         3.2. Project funding
  4. Metadata on access and use
         4.1. Landing pages
         4.2. Access rights and licences
  5. Other issues
     

This semi-structured data quality check has only been applied to a few CRIS feeds so far and may subsequently be subject to some fine-tuning, but it already contains the key sections that will be verified for future validations.

 

 

Mar 1st, 2019

First two CRIS systems test harvested by OpenAIRE (beta service)
Pablo de Castro, University of Strathclyde and euroCRIS
https://orcid.org/0000-0001-6300-1033

As of early March 2019, i.e. approximately one year since the launch of the (now officially finished) METIS2OpenAIRE project, the two first CRIS systems have already been test harvested by OpenAIRE through their specifically enabled OAI-PMH endpoint. These are METIS at Radboud University Nijmegen in the Netherlands and the CSC-hosted VIRTA national-level CRIS in Finland.
 

Both systems have exposed their metadata feed via specifically enabled OAI-PMH endpoints and have validated such feeds against the CERIF-XML Guidelines for CRIS Managers v1.1.1 by means of the minimally sufficient validator built by the METIS2OpenAIRE project.

Both METIS@Radboud and VIRTA are still exposing test environments while they adjust their internal configurations to best meet the formatting requirements for their metadata, but the number of records being harvested by OpenAIRE is already very significant.
 

 

The next steps are for the OpenAIRE Advance team in Bielefeld to provide feedback to the technical leads for both platforms in order to fix any formatting issues that are identified, and for the test harvesting to extend to additional CRIS systems that are able to validate their metadata feeds against the CERIF-XML Guidelines.

 

Nov 30th, 2018

CERIF-XML Guidelines for CRIS Managers start getting implemented
Pablo de Castro, University of Strathclyde and euroCRIS
https://orcid.org/0000-0001-6300-1033

A few presentations related to the implementation of the OpenAIRE CERIF-XML Guidelines for CRIS Managers were delivered at the recently held Autumn 2018 euroCRIS Strategic Membership Meeting in Warsaw. The main one was this "METIS2OpenAIRE: an update on the process for implementing the CERIF-XML Guidelines for CRIS Managers" presented by the METIS2OpenAIRE project leads Jan Dvořák and Pablo de Castro.

In view of the increasing number of CRIS systems attempring to expose their metadata feed via an OAI-PMH endpoint and have it tested against the minimally sufficient validator developed by the METIS2OpenAIRE project, Jan Dvořák provided a series of recommendations for this process based on the pioneering experience to make the METIS system at Radboud University Nijmegen compliant with these guidelines.
 

METIS@Radboud is in fact the first CRIS whose OAI-PMH endpoint URL has been publicly made available (during Jan's presentation). Other additional systems will soon follow suit though, such as VIRTA in Finland, who also had a presentation in the event programme by Joonas Nikkanen (CSC) and Dragan Ivanovic, and Pure, whose involvement as presented by Anna Clements has so far extended to a mapping from its data model to the CERIF-XML Guidelines. The membership meeting hosts at the Warsaw University of Technology are also progressing with the exposure of the CRIS metadata feed for their own Omega-PSIR@WUT system.

The Autumn 2018 meeting saw also a discussion on the planned expansion of the euroCRIS Directory of Research Information Systems (DRIS) and its possible use for providing persistent IDs to the CRIS systems listed in it together with a specific field to store the OAI-PMH endpoint URL.

 

 

Jul 12th, 2018

OpenAIRE compatibility for CRIS systems discussed at the CRIS2018 conference in Umeå
Pablo de Castro, University of Strathclyde and euroCRIS
  https://orcid.org/0000-0001-6300-1033

The recently released June 2018 edition of the OpenAIRE Newsletter features a reference to the METIS2OpenAIRE project. The mid-June CRIS2018 conference in Umeå saw a number of presentations (by Omega-PSIR, Pure and the METIS2OpenAIRE project itself) and discussions on OpenAIRE compatibility for CRIS systems. Following these, a summary of the current progress in the area and an overview of the next steps to take in order for CRIS metadata feeds to start featuring in the OpenAIRE aggregation was jointly written by the METIS2OpenAIRE team and the technical lead for CRIS interoperability within the OpenAIRE Advance project in Bielefeld.
 

As mentioned in the blogpost, the minimally sufficient validator developed by the project in order to test the compatibility of the metadata feed for the METIS institutional CRIS at Radboud Uni with the OpenAIRE Guidelines for CRIS Managers will for the time being not be made openly available for any external platform to test its own compatibility. While its functionality is still being tested and the process for having it transferred to the OpenAIRE hub in Athens gets completed, it will just be applied to testing the metadata feed for the CRIS solutions involved in the METIS2OpenAIRE project. It is very much a main project objective anyway that the validator eventually becomes fully available for external testing.

 

 

May 25th, 2018

Test-driving the OpenAIRE CRIS Guidelines 1.1
Jan Dvořák, euroCRIS, Charles University Prague and Czech Technical University
  https://orcid.org/0000-0001-8985-152X

An overview of the prototype validator that has been developed to assess compliance of a CRIS with the OpenAIRE Guidelines for CRIS Managers 1.1.

Within the METIS2OpenAIRE project we have developed a piece of software that checks whether a CRIS actually complies with the OpenAIRE Guidelines for CRIS Managers v1.1. It is called the “prototype OpenAIRE CRIS validator” and can be found on GitHub. Even if you don't have a live CRIS OAI-PMH endpoint at hand you can test-drive it on the example files that come with the OpenAIRE Guidelines for CRIS Managers 1.1 standard (Spoiler: all tests pass, see the figure below).

What does the validator test? The main thing is that it tries to get the CRIS metadata from all nine sets (i.e. object types) defined by the standard. For every record the tool checks if the XML metadata is valid with respect to the XML Schema that accompanies the standard specification. Also, we check if the OAI identifiers of the objects are well-formed and correspond to the type of the object and its internal identifier. We also watch for possible non-unique (i.e., repeated) identifiers and enforce a few other basic expectations.

On top of this there are three endpoint-level requests which are placed and their results carefully analyzed.

But that’s not all there is to it. Once the whole bulk of metadata is retrieved from the CRIS, it undergoes a complex test for referential integrity and functional dependency. The former means that if a CERIF object is mentioned in a foreign context, e.g. a person as an author of a publication, that person has a full, first-class record in the openaire_cris_persons set. The latter check looks for contraditions in data: e.g. you’d have two different titles for the same project (in the same language).

And no, we are not just sampling the CRIS endpoint. We actually fetch its complete metatada contents. That is being stored in files for diagnostics and/or other use.

So how does a real CRIS fare in this labyrinth of integrity constraints? The METIS installation at Radboud University in Nijmegen, the Netherlands, is the first CRIS that has the interface complying with the OpenAIRE CRIS Guidelines 1.1 standard. We are still hunting some bugs that are related to the data entry of individual, mostly legacy records, so we are not yet “all green”, but that is coming very close. On recent harvests the tool harvested over 300,000 metadata records, close to 350 MBytes worth of metadata. METIS populates 5 out of the total 9 sets (not many CRIS could populate all of them).

Soon we’ll be trying the validator on a second CRIS, Omega-PSIR at the Warsaw University of Technology. And we are confident the software will be useful for DSpace-CRIS, for PURE and for any other CRIS that sets on the path towards OpenAIRE compatibility.

 

 

May 8th, 2018

Making the METIS CRIS at Radboud University OpenAIRE-compatible
Ed Simons, Radboud University Nijmegen and euroCRIS President
  http://orcid.org/0000-0002-3019-3988

METIS is the institutional CRIS at Radboud University in Nijmegen, the Netherlands. It's been running since 1992 and it serves 5000+ researchers at the institution. The Dec'2017 OpenAIRE call for proposals was seen as a good opportunity to make METIS OpenAIRE-compatible. Radboud University already runs a repository that is OpenAIRE-compliant, but this chance for an extended information exchange with OpenAIRE via the recently released CERIF-XML Guidelines for CRIS Managers seemed a good opportunity to export a wider range of metadata elements beyond the basic bibliographic information.

Key among these are organizational affiliation, project and dataset metadata. 
 

The METIS2OpenAIRE proposal was awarded funding by OpenAIRE at the beginning of the year and the 3.5-month project started early Feb 2018 with the goal to validate the METIS feed exposed via a specifically enabled OAI-PMH endpoint against the CERIF-XML OpenAIRE Guidelines. 

The minimally sufficient validator itself is being built on the basis of this early testing for METIS. The end-goal is that this validator will eventually – once it gets fine-tuned and tested on other CRIS feeds – be made available for any CRIS system to test its own OpenAIRE compatibility.

The process for exposing the METIS feed via the OAI-PMH endpoint is actually providing a good opportunity to harmonise and complete the underlying data model structure in METIS. This is a mid-term institutional objective that started in earnest with the P-O-PF project for information exchange across institutional CRIS systems in the Netherlands, both across each other and with a research funder CRIS.

The METIS2OpenAIRE project directly benefits the further development and refinenment of both the METIS CRIS at Radboud University as well as the OpenAIRE CERIF-XML Guidelines themselves. Regarding the latter, several adjustments in the Guidelines have already been taken in by the editing team based on the practical experience gained from the project. To give a concrete example: during the project it has become obvious that some of the linking relationships foreseen in the guidelines included some form of duplication. Removing the linking relationships for some of the entities and/or redefining the relationships was then deemed necessary as this meant a substantial improvement of the guidelines. 

As for the METIS CRIS, the implementation of the Guidelines revealed some shortcomings and gaps in the current data model for the system that will need to be addressed in the near future. It turned out that specific infromation on 'funding' is missing in the CRIS right now and also some of the information elements on 'projects'. This finding is a direct consequence of the 'old tradition' in which CRIS emerged as reporting systems mainly focused on outputs or research results and not so much as fully-fledged research information management and profiling systems of use for the research managers and researchers themselves. This view is however rapidly changing and a project like METIS2OpenAIRE certainly helps to raise awareness of the changing position and role of CRISs and hence of the need to review and update the "traditional" CRIS data model and functionality. As such the project clearly has a broader impact and meaning, not only on the further development of the METIS CRIS, but as an inspiration for CRIS development, policy-making and implementation in general.

We are very much hoping that this first step towards CRIS interoperability with OpenAIRE will allow other systems to also progress along this line. METIS is presently running at only two Dutch institutions, Radboud University Nijmegen and Erasmus University Rotterdam, but Omega-PSIR and Pure, also part of the METIS2OpenAIRE project as budget-neutral partners, will provide the opportunity for a much wider implementation of these CERIF-XML Guidelines for CRIS Managers in the course of this year.

 

 

Feb 26th, 2018

METIS2OpenAIRE: Adding CRIS systems to the list of OpenAIRE data providers
Pablo de Castro, University of Strathclyde and euroCRIS
  https://orcid.org/0000-0001-6300-1033

The OpenAIRE-funded METIS2OpenAIRE project run by euroCRIS aims to start making CRIS systems OpenAIRE-compatible, thus unlocking the mechanism that will allow institutions to join the list of OpenAIRE data providers through their CRISs. This will be achieved via the implementation of the recently released OpenAIRE CERIF-XML Guidelines for CRIS Managers v1.1 on a first institutional CRIS, namely METIS at Radboud University in Nijmegen, Netherlands.

Adding CRIS systems to the rest of categories already displayed on the OpenAIRE directory of data providers will mean significant progress in several areas:

  • While most institutions in Europe rely on their institutional repositories for delivering the relevant information on EU-funded project outputs to OpenAIRE, there are specific cases where this is not technically feasible – mainly due to the lack of system interoperability. For instance, fewer than 10% of the 258 UK publication repositories listed in OpenDOAR are currently OpenAIRE-compliant, which means that many UK institutions with a large number of EU-funded projects are unable to deliver their project outputs into the OpenAIRE aggregation unless they post duplicates in Zenodo

  • National CRIS systems are frequently the default research information management platform in countries where the repository infrastructure is not yet fully consolidated. Again, because of the lack of interoperability with such national-level platforms, a significant fraction of EU-funded project outputs from such countries fails to ever reach the OpenAIRE aggregation

  • The volume of contextual metadata for research output records harvested from CRIS systems (including research data and text-based outputs) will be far larger than currently. CRIS records typically contain detailed information on aspects like organisational affiliation and research funding that may not be so thoroughly covered by other platforms. This additional input will potentially allow OpenAIRE to step up one gear in the quality of the information offered from the aggregator.

Besides directly working on a first case study for a specific institutional system, the METIS2OpenAIRE project is deeply aware of the need to promote the expansion of CRIS compatibility with OpenAIRE beyond a few first cases. This is why euroCRIS have teamed up with two external, budget-neutral partners for this initiative. These external partners will be able to benefit from the mapping exercise carried out on METIS at Radboud University to progress with their own implementations of the OpenAIRE Guidelines. These two ‘travel companions’ for METIS2OpenAIRE are:

  • OMEGA-PSIR at the Warsaw University of Technology (WUT). OMEGA-PSIR is a CRIS solution widely implemented in Poland. Originally developed at WUT, it’s now running at 16 Polish institutions who run their own User Group.

  • Elsevier PURE. With over 200 implementations worldwide, this the most widely adopted CRIS solution. PURE becoming OpenAIRE compatible – which is planned to happen next October upon the release of version 5.13 – will mean (i) that many institutions currently unable to share their outputs with OpenAIRE will start doing it and (ii) that the interoperability with institutional repositories coupled to PURE will much improve as a result.

Stakeholders external to the METIS2OpenAIRE project will follow their own timeschedules for the implementation of the OpenAIRE Guidelines for CRIS managers, i.e. they haven’t committed to meet the three-and-a-half project lifetime that applies to METIS2OpenAIRE. This collaboration with external stakeholders should however ensure that the OpenAIRE list of CRIS data providers shown above will soon start featuring quite a few new entries.