As indicated in a previous post, the next CERIF release – namely CERIF 1.6 – happens before the summer. It follows decisions taken in a series of CERIF task group meetings (Athens (10/2012), Braga (02/2013), Rome (03/2013)). According to agreement by the CERIF task group, this CERIF 1.6 release is meant for extensive testing, to get feedback and input with respect to the next major release – CERIF 2.0. The formal CERIF 1.6 models are now available:
- CERIF 1.6 XML (publicly available from the euroCRIS website)
- CERIF 1.6 SQL (internally available for euroCRIS members)
Major updates in the current CERIF 1.6 release centre around the CERIF entity cfResultProduct (cfResProd). CERIF is a formal model (supplying a formal syntax and declared semantics) to allow for different meanings of entities and their relationships in contexts. Therefore, all entities, including the cfResultProduct (cfResProd) entity, in addition to their naming (syntax) are enhanced with semantic (contextual) information to become more meaningful. Such enhancements can be implemented through the so-called CERIF Semantic Layer and can be seen as contextual vocabularies to set the boundaries.
The formal CERIF entity cfResProd represented by its short name, is in fact a container to aggregate all potential types. The history or legacy and thus the usage of the CERIF cfResultProduct entity informed about its meaning over time – namely “data” or “dataset”, and the discussions within the CERIF task group leading to the current updates started from such an understanding (see also “Datasets in CERIF”). It must be noted that in a CERIF understanding ‘product’ is not to be confused with a ‘commercial product’ but rather to be seen as a result ‘product’ of research activities. Formally, a type such as “dataset” is stored within the so-called CERIF Semantic Layer, where it maintains its own identifier, namespace, examples, descriptions, source of origin, etc. (see also CERIF in Brief).
The major updates in the current CERIF release to support recording and thus an understanding of datasets have been informed by the Jisc-funded C4D (CERIF for Datasets) project, following investigations* of CKAN, DCAT and eGMS. The short list below indicates the updates, where more details will be published on the euroCRIS website over time:
- addition of Alternative Title for ‘dataset’
- new link entity from ‘dataset’ to geographic bounding box
- new link entity from geographic bounding box to measurement
- new DateTime attribute with measurements
- lineage/provenance is considered a time-stamped role ‘measurement’ of a ‘dataset’
- a comment in general is understood as a ‘measurement’
cfMeas.cfValJudgeText=”This is a comment that allows for … etc.”
- new attribute cfOrder in links to Persons and Organisations from Results (e.g.)
- no changes with Localisation entities
cfLanguage, cfCountry, cfCurrency
- deprecation of a few attributes with future releases
- informing about handling of dates
- incorporating and getting inspired by existing vocabularies and governance structures, such as CASRAI, VIVO, ISOCAT, V4OA, SKOS, RDF, etc.)
Some more CERIF XML examples will be posted here with this blog, shortly. For a testing and proper validation of CERIF 1.6 XML files, the following header should be added:
<CERIF xmlns="urn:xmlns:org:eurocris:cerif-1.6-2" xsi:schemaLocation="urn:xmlns:org:eurocris:cerif-1.6-2 http://www.eurocris.org/Uploads/Web%20pages/CERIF1.6/CERIF_1.6_2.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" date="2013-07-24" sourceDatabase="LabelForYourData">
* See for mapping approaches of CKAN, DCAT and eGMS to CERIF in the paper’s appendix “A multi-level metadata approach for a Public Sector Information data infrastructure“ by N. Houssos, B. Joerg, B. Matthews, in CRIS 2012 Proceedings.