In this section you will find definitions of key concepts related to the CLARIN:EL Research Infrastructure as well as answers to frequently asked questions about the the participation in the CLARIN:EL network, the use of the Infrastructure or technical and legal issues.
The CLARIN project is a european initiative to collect, organise and, finally, distribute to the research community language resouces in all european languages, through a research infrastrucure which will also offer language tools.
This initiative mainly targets the research community but it also aims at the public. Organisation-wise, it has the form of a pan-european network of research centers and offers
- access to resources, services and language processing tools
- language resources documentation through metadata
- coordination of the creation, storage, management and access to the language resources
- training and dissemination on the use of language technology.
CLARIN:EL is the Greek part of the CLARIN infrastructure. You can find additional information here.
By the terms Language Resources (LRs) and Language Technologies (LTs) we refer to language data (written or spoken) and to the tools used for their processing. We distinguish the following categories:
- primary data
- digital/digitised resources, such as written texts (e.g. digitised books, web texts, newspapers, corpora etc.), recordings of spoken language (e.g. interviews, radio broadcasts etc.)
- video recordings (e.g. TV shows, facial expressions collections, gestures etc.)
- images (e.g. digital/digitised photographs with their captions etc.)
- processed data
- various types of annotations of texts, sound and multimedia data, automatically or manually created (e.g. morphosyntactically annotated texts, transcriptions of spoken data, video annotations etc.)
- reference resources
- various types of structured language data (e.g. word lists, dictionaries, thesauri etc.) which can be used for improved organisation, processing and study of primary data
- language technology tools/applications
- tools and integrated applications that perform various types of language processing (e.g. multilingual text alignment, morphological annotation, lemmatisation, parsing, knowledge extraction etc.)
- visuallisation tools (e.g. integrated environments for the presentation of texts, mutlimedia collections, processing results etc.)
The CLARIN:EL Central Aggregator (or Central Catalogue) is the central repository of the CLARIN:EL Infrastructure, which is responsible for
- the harvesting of metadata from the local repositories,
- the organisation and the presentation of the metadata descriptions in a uniform catalogue and
- the provision of access to the language resources to the network members and to the public.
Language Technologies (LTs) are workflows of tools for multilevel analysis, processing, annotation, enrichment and transformation of language data.
Language Processing Services are services allowing the use of Language Resources and Technologies as well as their applications over the web.
Resource Provider is any organisation or individual that makes availble to the CLARIN:EL Infrastructure Language Resources, Technologies and/or Language processing services.
With the term Language Data we refer to digital language content of any form and medium, structured or unstructured.
Language Processing Tools are computational tools in the form of software, aiming at the
- processing, and
- annotation of language data.
Organisations wishing to join the CLARIN:EL network and contribute language resources, tools and language processing web services to the Infrastructure, need to apply to become members by filling in the online Membership Application (in Greek).
Individuals wishing to join the CLARIN:EL network and share through the CLARIN:EL Infrastructure digital content, language resources and/or language processing tools they have developed, need to register to the Infrastructure with their academic or personal account. More information can be found here.
Organisations - members of the network can setup their own repositories, if they so wish. There they will be able to store, document, manage and curate their resources.
Academic and/or Research Organisations wishing to join the CLARIN:EL network and contribute language resources, tools and language processing web services to the Infrastructure, first need to apply to become members (Online Membership Application).
After the examination and approval of the Membership Applications by the CLARIN:EL General Assembly (GA), CLARIN:EL Research Infrastructure makes contact with the Organisation for more information concerning the creation of an Institutional Repostitory for the new Member Organisation.
CLARIN:EL non-registered users (guests) can download only the open Language Resources (in accordance with the licensing terms and conditions of use imposed by the LR providers). More information on how to view and download a resource in the CLARIN:EL Infrastructure can be found here.
CLARIN:EL registered users can download all Language Resources (not only the open ones) in accordance with the licensing terms and conditions of use imposed by the LR providers. More information on how to view and download a resource in the CLARIN:EL Infrastructure can be found here.
In order to store and share your resources through CLARIN:EL you must be a registered user. More information on how to share your resources through the CLARIN:EL Infrastructure can be found here or in the CLARIN:EL User Manual which is freely accessible to all (registered and non-registered users of the Infrastructure).
If you are affiliated to an Organisation (University, Research Centre etc.), you can use your academic credentials (Academic ID) to login to CLARIN:EL Research Infrastructure. Click on the Sign in button at the top right of the CLARIN:EL Central Inventory homepage, and then click on Greek academic login button to use your academic credentials. In this case, you will be automatically connected to the Repository of your Organisation in CLARIN:EL, through which you will be able to store and share your resources. You can find detailed instructions on how to sign in using your academic account here, or in the CLARIN:EL User Manual.
Through the CLARIN:EL Infrastructure you can share digital language data of various language modalities (written, spoken, multimodal, sign, lexical/conceptual, etc.) and in various media (text, audio, video, etc.), as well as language processing tools and web services:
- text corpora
- lexical/conceptual resources
- models & computational grammars
- language processing tools & services, e.g. lemmatizers, tokenizers, part of speech taggers, dependency parsers, terminology extractors, information extractors, annotation tools, etc.
More information on how to create and share a resource through the CLARIN:EL Infrastructure can be found here.
For more information and guidance regarding the file formats which are recommended for depositing resources in CLARIN:EL in order to ensure long-term accessibility and interoperability, please also consult the document CLARIN:EL Recommended file formats.
In order to deposit their resources, users have to be registered users and members of one of the repositories of the infrastructure.
The procedure for language resource description and uploading has the following steps (detailed guidellines can be found here):
- Initially, the user-curator needs to describe the resource, that is, to add its metadata. This can be done in two ways:
- using the dedicated tool (metadata editor) provided by the infrastructure. Once all the obligatory fields are filled in, the descritpion can be saved and stored.
- by uploading existing descriptions. This is the case of descriptions created independently of the tool; these have to be in the form of xml files. The infrastructure checks the compatibility of the descriptions with the metadata schema used by the infrastructure and, if they are compatible, they are stored.
- Following the storage of the metadata, the next stage is the storage of the data itself. At this stage, the user-curator uploads the data to the Infrastructure. The data have to be compressed (files 'zip', 'tar.gz', 'gz', 'tgz', 'tar', 'bzip2', or 'bz2').
- At the last stage, the user-curator is informed by email that the process has been completed.
In order to process one of your resources (that is, a resource that is not stored in the infrastructure) with one of the web services of the infrastructure (that is, a language processing tool used over the internet), you first have to select the web service you want and then to upload your resource for processing, following these steps:
- From the central inventory homepage select browse.
- Filter the catalog of available resources to locate the web service you want.
- first by restricting the catalog to only tools/services by selecting Filter by > Resource Type = Tool Service
- and then restrict the results to only web services by selecting Filter by > Processing service = yes
- Select the web service you want.
- In the new page that opens, you can see the description of the service. Select the tab Access and then Use button to start uploading your resource.
- In the new page, click on Use this workflow button. Then upload the data you want to process.
- NOTE: the resource must be in a zip file not exceeding 2 MB with a filename in Latin characters, with no spaces in it!
- Select Next.
- NOTE: the resource you upload as well as the processing outcome will be stored for two (2) days in your repository (Institutional or Hosted Resources Repository). After this period, the resources will be deleted.
- The infrastructure informs you about the progress of the processing, and, when it is completed, you receive an email with a link to the processed resource.
- Through CLARIN:EL Infrastructure you can store and share your data in a free, safe, friendly and transparent environment.
With a firm orientation towards the creation of an openness culture and the relevant ecosystem for LRs, licensing LRs, tools and/or services is a key concern of the CLARIN:EL Research Infrastructure. In this context, CLARIN:EL Infrastructure fosters open data sharing believing that this benefits not only users (consumers), but data providers as well. However, all interested parties can also share safely out to the community "closed" resources having full control over the Resource distribution.
Your data is visible through google, VLO, DataCite, OLAC, Data Citation Index, arXive, etc., giving you maximal credit for your work
Data is easy to cite. CLARIN:EL provides ready-to-use one-click citations in DataCite format, including the resource title, the resource version, the creation date, the resource type, the publisher, as well as the PID of the resource. This ensures that your resource is fully and correctly cited.
See "Why should I share my data through CLARIN:EL?". Everything applies to software tools too.
Language processing tools can be deposited in the CLARIN:EL Infrastructure by creating a metadata record and then:
adding a reference to the website where the tools are already in use, and/or
storing the code in the CLARIN:EL Infrastructure, where users can download the tools and use them locally, and/or
converting the tools into web services running on the CLARIN:EL Infrastructure (in cooperation with the CLARIN:EL Technical Team).
In case you encounter any problem while logging in to the CLARIN:EL Infrastructure, please let us know through our Technical Helpdesk at email@example.com@ksedpleh-lacinhcet .
An error might occur the first time you log in to the CLARIN:EL Infrastructure using your academic account. This error indicates that your authentication is successful, however, your identity provider does not provide sufficient information to the CLARIN:EL Infrastructure, e.g. your email address. This means that your home institution (e.g. University, Organisation, etc.) in order to protect your personal data, did not send us enough data about you to operate our service. Please note that the only information needed to complete your registration in CLARIN:EL is your first name, last name and email address.
You can upload only metadata descriptions to CLARIN:EL, while the resource itself might be available from elsewhere (another URL, through the owner etc.). However, resources that are not in CLARIN:EL cannot be processed with the available CLARIN:EL language processing web services. So, if you would like to use these services on your resources, it is best to upload the data as well as the metadata to the CLARIN:EL infrastructure.
Computational Support Services for the CLARIN:EL infrastructure are
- assignment of Persistent Identifiers (PIDs),
- user authentication and authorization services (Authentication and Authorization Infrastructure, AAI),
- storage services, and
- provision of computational power for the execution of web services.
The Hosted Resources Repository (HRR) hosts resources offered by providers that do not maintain a repository and makes them available to the users of the infrastructure. It also hosts the metadata descriptions of these resources and makes them available for harvesting by the Central Inventory. Finally, it offers user mamagement services to its users.
Each Institutional Repository hosts the resources of the relevant institution/organisation and makes them available to the users of the infrastructure according to the appropriate licenses. It also hosts the metadata descriptions for these resources and makes them available for harvesting by the Central Inventory. Finally, it offers user management services to its users.
The Central Inventory hosts the central resource catalogue of the infrastructure, which contains information for all the resources of the infrastructure. The Central Inventory gathers this information by harvesting the metadata from the local repositories. Finally, together with the CLARIN:EL portal, it hosts the technical and legal support services of the CLARIN:EL Infrastructure.
PID is a unique and persistent resource identifier that is automatically assigned to each resource contributed to CLARIN:EL. PID is in the form of a URL providing a permanent link that will resolve correctly even if in some distant future the data is moved. It is highly recommended that you use the PID (handle) when you want to cite a CLARIN:EL resource.
Any change to the data and metadata of a published resource in the CLARIN:EL Infrastructure means the creation and publication of a new version for that resource, with a new PID. However, if the changes are minimal (e.g., typos or clear mistakes), please contact the CLARIN:EL Metadata Helpdesk at firstname.lastname@example.org@ksedpleh-atadatem . After discussion with the CLARIN:EL metadata team it will be decided whether or not these changes should lead to the creation and publication of a new version of the resource.
More information on the actions you are allowed to perform on your resources can be found in the CLARIN:EL User Manual.
If you wish to withdraw a published resource from CLARIN:EL Central Inventroy, you can ask for it to be unpublished. This is done from your Dashboard by clicking on View my resources button, then selecting Request to unpublish action from the Actions field. Once your request is successfully submitted, the Spervisor of the Institutional Repository where the resource was published will unpublish your resource from the CLARIN:EL Central Inventory.
More information on how to unpublish a resource from CLARIN:EL Central Inventory can be found in the CLARIN:EL User Manual.
For any questions and/or clarifications, you can always contact the CLARIN:EL Technical Helpdesk at email@example.com@ksedpleh-lacinhcet .
CLARIN:EL Infrastructure promotes the reuse and sharing of open digital language resources, as well as language technology tools and services, in accordance with Open Data Principles and FAIR Data Principles. However, through CLARIN:EL you can also upload and store "closed" data if it is impossible for them to be made publicly accessible. In such cases, please contact the CLARIN:EL Legal Helpdesk at firstname.lastname@example.org@ksedpleh-lagel .
Quite safe! The data retention and storage policy of the CLARIN:EL Infrastructure includes a specific protection and disaster recovery plan:
- All the data of the CLARIN:EL's Institutional Repositories are stored on the servers of the Athena RC's premises, while an on-site backup copy is created and maintained for each one of them.
- Another copy (off-site copy) is created and maintained off-site (specifically, at GRNET).
- A regular check of the integrity of your data is carried out.
As you may have already read here, it is not possible to edit the data and/or metadata of resources once they have been published in the CLARIN:EL Central Inventory. If you wish to update and/or make changes to the data and/or metadata of a published resource, you need to create a new version of the resource, as described in detail here.
The person who is legally eligible to license and actually licenses the resource. The licensor could be different from the creator, the distributor or the Intellectual Property Rights (IPR) holder. The licensor has the necessary rights or licences to license the work and is the party that actually licenses the resource that enters the clarin:el network. The licensor will have obtained the necessary rights or licences from the IPR holder and may have a distribution agreement with a distributor that disseminates the work under a set of conditions defined in the specific licence and collects revenue on the licensor's behalf. The attribution of the creator, separately from the attribution of the licensor, may be part of the licence under which the resource is distributed (as e.g. is the case with Creative Commons Licences).
Distribution Rights Holder
The person or organization that holds the distribution rights. The range and scope of distribution rights is defined in the distribution agreement. The distributor in most cases only has a limited licence to distribute the work and collect royalties on behalf of the licensor or the IPR holder and cannot give to any recipient of the work permissions that exceed the scope of the distribution agreement (e.g. to allow uses of the work that are not defined in the distribution agreement).
The person or organization who holds the full Intellectual Property Rights (Copyright, trademark etc) that subsist in the resource. The IPR holder could be different from the creator that may have assigned the rights to the IPR holder (e.g. an author as a creator assigns her rights to the publisher who is the IPR holder) and the distributor that holds a specific licence (i.e. a permission) to distribute the work within the clarin:el network.
Open licenses are standard, irrevocable public licenses, that allow rights holders to share their work. Open licenses cover all types of work, such as content, data and software.
Every license allows users to use the licensed work subject to the terms that apply. They may license the work subject to attribution of the source or non-commercial use or non-derivative works or to share alike which means to redistribute the work or any derivatives thereof under the identical terms.
The most common open licenses are
- For content
Useful information on the different types of Public Domain Data (for example Public Sector Infomation, PSI) as well as their restrictions of use can be found here.
PSI licences constitute a key instrument for the release of huge amounts of information by Public Sector Bodies (PSBs). Useful information on key types of licensing, making reference to PSI licences as well, can be found here.
- Commission encourages re-use of public sector data http://europa.eu/rapid/press-release_IP-14-840_en.htm
- Open data Portals https://ec.europa.eu/digital-agenda/en/open-data-portals
- Copyright Law, 2011/833/EU http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32011D0833
Licensing LRs and more specifically selecting a licence of use that legally binds the End-User to the terms and conditions of using the resource is a key concern of the CLARIN:EL Research Infrastructure.
With a firm orientation towards the creation of an openness culture and the relevant ecosystem for LRs, CLARIN:EL Research Infrastructure fosters Open Data policies. Thus, resources and services should ideally be open or shared at least for research purposes. In all cases, LRs must be offered according to certain formal legal conditions and terms clearly indicated at the licence text.
To limit the complexity of licensing, a range of recommended license setups are provided by CLARIN:EL in the form of templates for the LR providers to choose from.
More information about the CLARIN:EL recommended licensing scheme can be found here.
If you encounter a problem or need help selecting the right licence for your data, please contact the CLARIN:EL Legal Helpdesk at email@example.com@ksedpleh-lagel .
The CLARIN:EL model licensing scheme recommended by the CLARIN:EL Research Infrastructure can be found here (tab: Recommended licensing scheme) and is organised on the following two axes:
- Open Licences: Creative Commons licences (CC, starting with Creative Commons Zero (CC-0) and all possible combinations along the CC differentiation of rights of use) for datasets and Free Open Source Software (FOSS) licences for s/w are the first level of legal machinery applied.
- Licences restricting redistribution: META-SHARE No Redistribution licences, a set of licenses that allow use and exploitation of the resource, but not its further redistribution.
For further information regarding the licensing of resources in the CLARIN:EL Infrastructure, please feel free to contact the CLARIN:EL Legal Helpdesk at firstname.lastname@example.org@ksedpleh-lagel .