In this section you will find definitions of key concepts related to the CLARIN:EL Research Infrastructure as well as answers to frequently asked questions about the the participation in the CLARIN:EL network, the use of the Infrastructure or technical and legal issues.
The CLARIN project is a european initiative to collect, organise and, finally, distribute to the research community language resouces in all european languages, through a research infrastrucure which will also offer language tools.
This initiative mainly targets the research community but it also aims at the public. Organisation-wise, it has the form of a pan-european network of research centers and offers
- access to resources, services and language processing tools
- language resources documentation through metadata
- coordination of the creation, storage, management and access to the language resources
- training and dissemination on the use of language technology.
CLARIN:EL is the Greek part of the CLARIN infrastructure. You can find additional information here.
By the terms Language Resources (LRs) and Language Technologies (LTs) we refer to language data (written or spoken) and to the tools used for their processing. We distinguish the following categories:
- primary data
- digital/digitised resources, such as written texts (e.g. digitised books, web texts, newspapers, corpora etc.), recordings of spoken language (e.g. interviews, radio broadcasts etc.)
- video recordings (e.g. TV shows, facial expressions collections, gestures etc.)
- images (e.g. digital/digitised photographs with their captions etc.)
- processed data
- various types of annotations of texts, sound and multimedia data, automatically or manually created (e.g. morphosyntactically annotated texts, transcriptions of spoken data, video annotations etc.)
- reference resources
- various types of structured language data (e.g. word lists, dictionaries, thesauri etc.) which can be used for improved organisation, processing and study of primary data
- language technology tools/applications
- tools and integrated applications that perform various types of language processing (e.g. multilingual text alignment, morphological annotation, lemmatisation, parsing, knowledge extraction etc.)
- visuallisation tools (e.g. integrated environments for the presentation of texts, mutlimedia collections, processing results etc.)
The CLARIN:EL Central Aggregator (or Central Catalogue) is the central repository of the CLARIN:EL Infrastructure, which is responsible for
- the harvesting of metadata from the local repositories,
- the organisation and the presentation of the metadata descriptions in a uniform catalogue and
- the provision of access to the language resources to the network members and to the public.
Language Technologies (LTs) are workflows of tools for multilevel analysis, processing, annotation, enrichment and transformation of language data.
Language Processing Services are services allowing the use of Language Resources and Technologies as well as their applications over the web.
Resource Provider is any organisation or individual that makes availble to the CLARIN:EL Infrastructure Language Resources, Technologies and/or Language processing services.
With the term Language Data we refer to digital language content of any form and medium, structured or unstructured.
Language Processing Tools are computational tools in the form of software, aiming at the
- processing, and
- annotation of language data.
Organisations wishing to share digital content, language resources and/or language processing tools through the clarin:el infrastructure have to fill in the Membership Application, scan it and send the scanned copy (as pdf) to the clarin:el Network Coordinator and the Deputy co-ordinator.
Individuals wishing to share through the clarin:el infrastructure digital content, language resources and/or language processing tools they have developed, can express their interest by filling in and sending the relevant application to the Network Coordinator and the Deputy co-ordinator.
Organisations - members of the network can setup their own repositories, if they so wish. There they will be able to store, document, manage and curate their resources.
In this case, you need to contact the Network Director and the Technical Administrator of clarin:el; you also need to appoint a technical administrator of your repository (Repository Technical Manager), who will be responsible for setting up the repository and for user management. Detailed information can be found in the Repository Manager's Manual.
In order to deposit their resources, users have to be registered members of one of the repositories of the infrastructure with editor rights (assigned to them by the repository manager).
The procedure for language resource description and uploading has the following steps (detailed guidellines -in Greek- can be found here):
- Initially, the user-editor needs to describe the resource, that is, to add its metadata. This can be done in two ways:
- using the dedicated tool (metadata editor) provided by the infrastructure. The tool guides the user-editor to describe the resource, indicating the obligatory fields to be completed. Once all the obligatory fields are filled in, the descritpion is stored.
- by uploading existing descriptions. This is the case of descriptions created independently of the tool; these have to be in the form of xml files. The infrastructure checks the compatibility of the descriptions with the metadata schema used by the infrastructure and, if they are compatible, they are stored.
- Following the storage of the metadata, the next stage is the storage of the data itself. At this stage, the user-editor uploads the data to a given endpoint. This endpoint functions as the local point where all users of the repository put their data files and from which these are harvested by the infrastructure. The data have to be compressed (files 'zip', 'tar.gz', 'gz', 'tgz', 'tar', 'bzip2', or 'bz2').
- At the last stage, the user-editor is informed by email that the process of data uploading has been completed.
In order to process one of your resources (that is, a resource that is not stored in the infrastructure) with one of the web services of the infrastructure(that is, a language processing tool used over the internet), you first have to select the web service you want and then to upload your resource for processing, following these steps:
- from the central inventory of the infrastructure select browse
- filter the catalog of available resources to locate the web service you want
- first by restricting the catalog to only tools/services by selecting Filter by > Resource Type = Tool Service
- and then restrict the results to only web services by selecting Filter by > Processing service = yes
- select the web service you want
- in the new page that opens, you can see the description of the service. Select upload and process to start uploading your resource
- in the new page fill in the description of your resource to be uploaded
- NOTE: the resource must be in a zip file not exceeding 2 MB!
- Select save and start processing
- NOTE: if you are a simple user, the resource you upload as well as the processing outcome will be internally stored for two (2) days in your repository (Institutional or Hosted Resources Repository). After this period, the resources will be deleted. However, you can store them permanently and share them with other users by applying to be an editor within these two days.
- The infrastructure informs you about the progress of the processing, and, when it is completed, you receive an email with a link to the processed resource.
You can upload only metadata descriptions to clarin:el, while the resource itself might be available from elsewhere (another URL, through the owner etc.). However, resources that are not in clarin:el cannot be processed with the available clarin:el language processing web services. So, if you would like to use these services on your resources, it is best to upload the data as well as the metadata to the clarin:el infrastructure.
Computational Support Services for the clarin:el infrastructure are
- assignment of Persistent Identifiers (PIDs),
- user authentication and authorization services (Authentication and Authorization Infrastructure, AAI),
- storage services, and
- provision of computational power for the execution of web services.
The Hosted Resources Repository (HRR) hosts resources offered by providers that do not maintain a repository and makes them available to the users of the infrastructure. It also hosts the metadata descriptions of these resources and makes them available for harvesting by the central aggregator. Finally, it offers user mamagement services to its users.
Each Institutional Repository hosts the resources of the relevant institution and makes them available to the users of the infrastructure according to the appropriate licenses. It also hosts the metadata descriptions for these resources and makes them available for harvesting by the central aggregator. Finally, it offers user management services to its users.
The Central Aggregator hosts the central resource catalogue of the infrastructure, which contains information for all the resources of the infrastructure. The aggregator gathers this information by harvesting the metadata from the local repositories. Finally, together with the clarin:el portal, it hosts the technical and legal support services.
The person who is legally eligible to license and actually licenses the resource. The licensor could be different from the creator, the distributor or the Intellectual Property Rights (IPR) holder. The licensor has the necessary rights or licences to license the work and is the party that actually licenses the resource that enters the clarin:el network. The licensor will have obtained the necessary rights or licences from the IPR holder and may have a distribution agreement with a distributor that disseminates the work under a set of conditions defined in the specific licence and collects revenue on the licensor's behalf. The attribution of the creator, separately from the attribution of the licensor, may be part of the licence under which the resource is distributed (as e.g. is the case with Creative Commons Licences).
Distribution Rights Holder
The person or organization that holds the distribution rights. The range and scope of distribution rights is defined in the distribution agreement. The distributor in most cases only has a limited licence to distribute the work and collect royalties on behalf of the licensor or the IPR holder and cannot give to any recipient of the work permissions that exceed the scope of the distribution agreement (e.g. to allow uses of the work that are not defined in the distribution agreement).
The person or organization who holds the full Intellectual Property Rights (Copyright, trademark etc) that subsist in the resource. The IPR holder could be different from the creator that may have assigned the rights to the IPR holder (e.g. an author as a creator assigns her rights to the publisher who is the IPR holder) and the distributor that holds a specific licence (i.e. a permission) to distribute the work within the clarin:el network.
Open licenses are standard, irrevocable public licenses, that allow rights holders to share their work. Open licenses cover all types of work, such as content, data and software.
Every license allows users to use the licensed work subject to the terms that apply. They may license the work subject to attribution of the source or non-commercial use or non-derivative works or to share alike which means to redistribute the work or any derivatives thereof under the identical terms.
The most common open licenses are
PSI licences constitute a key instrument for the release of huge amounts of information by Public Sector Bodies (PSBs). Useful information on key types of licensing, making reference to PSI licences as well, can be found here.
- Commission encourages re-use of public sector data http://europa.eu/rapid/press-release_IP-14-840_en.htm
- Open data Portals https://ec.europa.eu/digital-agenda/en/open-data-portals
- Copyright Law, 2011/833/EU http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32011D0833