Documenting LR's

In order for a resource to be shared in the CLARIN:EL Research Infrastructure, it has to be described first with the appropriate metadata, a set of data that describes and gives basic information about the resource that the provider intends to share such as the title of the resource, a description of the resource, technical information about the resource, information about the availability and licencing of the resource etc. For this purpose, CLARIN:EL comes with a metadata editor which guides the provider through the process of LRs' description. Once described, resources can be uploaded to the CLARIN:EL Infrastructure.

CLARIN:EL Resource Documentation Service aims at improving the accessibility of resources while complying with common standards and specifications concerning the metadata encoding:

  • Documentation of language resources, tools and web services with the endorsed metadata model , i.e. the CLARIN-SHARE model (currently v.1.0.1)
  • Assignment of persistent identifiers (PIDs) to language resources, tools and web services contributed to CLARIN:EL

Read more:

CLARIN:EL Research Infrastructure uses the CLARIN-SHARE metadata model for the description and documentation of Language Resources which is based on the META-SHARE metadata model.

The central entity of the CLARIN-SHARE ontology is the Language Resource per se. However, in the ontology, LRs are linked to other satellite entities such as

  • reference documents related to the LR (papers, reports, manuals etc.),
  • persons/organizations involved in its creation and use (creators, distributors etc.),
  • related projects and activities (funding projects, activities of usage etc.),
  • accompanying licenses, etc.

The interconnection between the LR and these satellite entities pictures the LR’s lifecycle from production to use.

LRs are classified along two main classification axes: Resource Type and Media Type (i.e. the medium on which the LR is implemented).

Resource Type

Corpus

Written/text corpora, Oral/spoken corpora, Multimodal/multimedia corpora

Lexical/Conceptual resource

Terminological resources, Word lists, Semantic lexica, Ontologies

Language description

Grammars, Typological databases, Language models

Tool/Service

Processing tools, Applications, Web services

Media Type
Text
Audio
Video
Image
TextNumerical
TextNgram

Each LR may take more than one mediaType values, since LRs can consist of parts belonging to different types of media: e.g., a multimodal corpus includes a video part (moving image), an audio part (dialogues) and a text part (subtitles and/or transcription of the dialogues). The mediaType values are: textaudiovideoimage, textNumerical and textNgram. More information about the CLARIN-SHARE metadata model can be found here.

The metadata records that describe LRs shared through CLARIN:EL must be included in the Infrastructure, a process supported by the CLARIN:EL metadata editor.

In order to access the editor and be able to add metadata records for your resources and deposit them to the CLARIN:EL Infrastructure, you must be a registered user with curator rights and permissions. More information on how to share your resources through CLARIN:EL can be found here.

 

You can find more information and detailed instructions on how to create a resource by using the CLARIN:EL metadata editor in the CLARIN:EL User Manual here.