Documenting LR's

In order for a resource to be shared in the CLARIN:EL Research Infrastructure, it has to be described first with the appropriate metadata, a set of data that describes and gives basic information about the resource that the provider intends to share such as the title of the resource, a description of the resource, technical information about the resource, information about the availability and licencing of the resource etc. For this purpose, CLARIN:EL comes with a metadata editor which guides the provider through the process of LRs' description. Once described, resources can be uploaded to the CLARIN:EL Infrastructure.

CLARIN:EL Resource Documentation Service aims at improving the accessibility of resources while complying with common standards and specifications concerning the metadata encoding:

  • Documentation of language resources, tools and web services with the endorsed metadata model , i.e. the META-SHARE model (currently v.3.0.2)
  • Assignment of persistent identifiers (PIDs) to language resources, tools and web services contributed to CLARIN:EL

 

Read more:

CLARIN:EL Research Infrastructure uses the META-SHARE metadata model for the description and documentation of Language Resources.

The central entity of the META-SHARE ontology is the Language Resource per se. However, in the ontology, LRs are linked to other satellite entities such as reference documents related to the LR (papers, reports, manuals etc.), persons/organizations involved in its creation and use (creators, distributors etc.), related projects and activities (funding projects, activities of usage etc.), accompanying licenses, etc. The interconnection between the LR and these satellite entities pictures the LR’s lifecycle from production to use

LRs are classified along two main classification axes: Resource Type and Media Type (i.e. the medium on which the LR is implemented).

Resource Type

Corpus

Written/text corpora, Oral/spoken corpora, Multimodal/multimedia corpora

Lexical/Conceptual resource

Terminological resources, Word lists, Semantic lexica, Ontologies

Language description

Grammars, Typological databases, Language models

Tool/Service

Processing tools, Applications, Web services

Media Type
Text
Audio
Video
Image
TextNumerical
TextNgram

Each LR may take more than one mediaType values, since LRs can consist of parts belonging to different types of media: e.g., a multimodal corpus includes a video part (moving image), an audio part (dialogues) and a text part (subtitles and/or transcription of the dialogues). The mediaType values are: textaudiovideoimage, textNumerical and textNgram.

More information about the META-SHARE metadata model can be found here.

The metadata records that describe LRs shared through CLARIN:EL must be included in the Infrastructure, a process supported by the CLARIN:EL metadata editor.

In order to access the editor, you must have a user account with editor credentials - more information on the process to follow when depositing resources is found here. Once you are notified that you have obtained the proper credentials and log in, you will be able to add metadata records for your resources; depending on your status, check the appropriate user manual that will guide you through the process.

To help users through the documentation process, we have prepared for the most common resource types a set of sample XML metadata records, which are compatible with the minimal META-SHARE schema. Users can upload these in the CLARIN:EL editor as seed records, and change and enrich them with the properties of their own LRs. 

Metadata records (XML files) are available for the following types of resources: