A few thoughts on OA Monitoring and CRISs (II)

Tue, 03/04/2018 - 14:37 -- euroCRIS Secret...
Pablo de Castro

Following the general considerations laid out in the previous one, the second part of this post will then focus on the potential role that CRIS systems, either at institutional or – especially – at national/regional level could play in supporting the task of monitoring the progress in Open Access implementation. The text is again intended to provide just a few reflections on a number of areas where CRISs could effectively contribute to this effort, without any intention to be comprehensive: this is above all aimed to prompt a discussion at the AT2OA workshop next week.

The first aspect to consider is that to monitor the percentage of research outputs available Open Access at an institution, a country or a region, the numerator and the denominator are equally important. This may sound rather obvious, but it's a fact that in the joint effort towards making as many academic outputs openly available as possible, platforms have traditionally made far more emphasis on the numerator, i.e. those outputs that are already Open Access. Institutional repositories were originally designed as Open Access repositories, meaning that only items for which the full-text was available would be deposited in them. Institutional platforms aiming to provide (for instance) a list of publications for researchers' CVs soon started adding plenty of metadata-only records to the repository collection, but they were suspect of breaking the original rules that said that every single institutional output should be made openly available from the repository. The result of this emphasis on as high a rate of openly available outputs in repositories is that the denominator, i.e. the total number of institutional, national or regional publications is frequently neglected.

Now some might say that the combination of Scopus and the WoS may suffice to get a sufficiently good estimation of the denominator. This may be the case for some disciplines, especially in English-speaking countries, but it's far from true in the Social Sciences and Humanities in non-English speaking countries. The GoogleScholar approach applied for the analysis in the pre-print mentioned in the first part of this text is a significantly more pragmatic approach to this ‘denominator issue’, but it would also be worth bearing in mind that there are platforms at institutions and at national-level that have specifically been conceived to collect all research outputs produced in their specific realm. These are CRIS systems. 

In the absence of a national CRIS or a comprehensive institutional CRIS implementation, this denominator issue will remain a challenge when trying to estimate the total amount of publications produced in a country. CRIS implementation at institutions is however very quickly growing, together with the system interoperability standards that should allow aggregations of CRIS contents to be gradually made available. This is precisely what stakeholders like ministries and research funders are most interested in, and the CERIF-XML standard proves to be the perfect tool for the purpose. A sufficient degree of CRIS/IR interoperability could thus provide the building blocks for the attempt to develop as accurate as possible a mechanism for Open Access Monitoring. 

Another key area that was mentioned in the previous post was how to deal with embargo periods. Repositories have steadily got much better at this, and the current state of the metadata sets for the major repository solutions could already enable a snapshot from which the info on how long it will take for eg 50% of the currently embargoed outputs to become Open Access could easily be produced. There is however a fairly long tail of non-mainstream repository solutions that may still not meet this end-of-embargo specification at metadata level, and again it's hard to assume that the Open Access repository network in a country will cover all the research-perfoming organisations in it. Zenodo is doing an immensely valuable service to the Open Access community as a whole in this area, but it remains very far from being comprehensive enough even for EU-funded project outputs: far too many institutions in any country we choose to look into are simply not part of the Open Access discussions and working groups.

Even in the absence of any significant awareness of Open Access, every manager of a research-performing organisation is however interested in monitoring the research outputs that result from the research work done in it. This typically means CRIS systems of one sort or another, and this again offers a unique opportunity for dragging these institutions into the national-level OA Monitoring discussion, significantly raising their awareness in the process. If CRIS systems were able to systematically include (and exchange) the required metadata elements to codify aspects like the Open Access status or the start- and end-dates for the embargo periods – see an example for a Pure record below – this would mean a key step forwards in the effort to collect this information in as comprehensive a way as possible. 

As it is the case in other areas, the main barrier that this 'vision' quickly meets is the fact that any suggestion to extend data models will need to be implemented by CRIS vendors or in-house-built platform developers. While this will typically mean a mid-term timeschedule, it doesn't mean it cannot be started now.

Not all CRIS systems out there are however provided by big commercial vendors. The early results for the recently closed OCLC/euroCRIS RIM survey show that smaller, more agile players may also have a relevant role to play on this issue. Moreover, the percentage of research-performing organisations among the over 300 survey respondents that declared to already have an operational RIM system was just 58%. There could then be clear opportunities for the gradual implementation of a consensus approach to using CRISs for Open Access monitoring purposes if the vision were there.

 Provided we managed to get there, there would still be a number of more technical challenges on how to measure Open Access via RIM systems. Interestingly, this is the area where most of the discussion has taken place thus far (again because of the characteristic bottom-up nature of Open Access implementation). This covers aspects like how to deal with conflicting Open Access flavours – eg how to deal with the rather frequent situation where the same outputs are simultaneously classed as Green OA in a given repository and Gold OA in a journal or in another repository – and with conflicting Open Access versions. Same as the OpenAIRE Guidelines for Literature Repositories already do, CERIF could definitely be of significant help when trying to enable the correct status to be assigned to a specific output, but it's also worth being born in mind that slightly under half of the RIM system instances identified in the above-mentioned survey state CERIF-compliance.

The vision should arguably come first, then the gradual (and slow) implementation, starting with the low-hanging fruit. With a national CRIS in place that is already fully interoperable with the institutional repository network in the country, Norway should be ideally placed to lead in this area. Denmark and the Netherlands also provide neat examples for small countries with a fairly widespread and interoperable CRIS infrastructure where such a vision could start getting implemented now. There may well be no one-size-fits-all solutions for this complex challenge, but perhaps the hardest question to answer is actually who should lead an ambitious effort that's not limited to a specific national-level scenario, but is to be carried out on top of a combination of fairly different initial conditions. The Knowledge Exchange have done an outstanding job thus far in providing the venues for the most advanced countries in the domain to have the opportunity to compare and discuss their specific approaches, but some kind of distributed supranational leadership needs to persist in an area where many other countries are still in the waiting line.