The European Language Resource Coordination (ELRC) was initiated in 2015 to collect language resources in all official European languages, as well as Norwegian Bokmål, Norwegian Nynorsk and Icelandic, with special focus on bi- and multilingual language data from various domains. The initial purpose was to collect such data to train CEF eTranslation, the Machine Translation service of the European Commission that can be used free of charge by all public administrations and public services in the EU Member States, Norway and Iceland, academia, NGOs as well as SMEs.

In order to further support the sharing of language data in Europe, ELRC conducted a first investigation among public services in 2019 (ELRC, 2019) with a view to identify the key stakeholders and mechanisms for the efficient sharing of language data in EU Member States, Norway and Iceland. This investigation led to the publication of the first edition of the ELRC White Paper published in December 2019 and titled Sustainable Language Data Sharing to Support Language Equality in Multilingual Europe – Why language data matters. Together with the ELRC National Anchor Points from all EU Member States, Iceland and Norway, European practices for sharing language data as well as the related challenges were investigated and recommendations on how to address these challenges in the future were prepared. In addition, the first White Paper edition provided a country profile for each of the CEF-affiliated countries, which focuses on the following topics:

  • National translation practices and information exchange in ministries and public administrations
  • Translation needs of the country
  • Language data creation and sharing infrastructure
  • National open data policies
  • Key stakeholders
  • Main challenges for sustainable data sharing
  • Required actions to overcome the identified challenges

Almost three years after the first report and in order to compare the results of the 2019 analysis with the status quo of 2022, illustrating latest developments, recent changes and achievements, ELRC publishes in November 2022 the second edition of the White Paper. While the initial scope of the White Paper was to report on the practices, challenges and recommendations for sustainable language data sharing within public services, this new publication, given the increasing importance of Artificial Intelligence (AI) and Language Technology (LT) across all European countries and sectors, focuses on the role of LT and language data in all EU member states, Iceland and Norway, critically discusses if the value of LT and language data has been recognised or if further awareness-raising actions are required, while taking into account recent developments in this respect as well as national regulations related to AI. Thus, the updated country profiles – in addition to their original contents – also provide some insights into:

  • The role of LT and language data in each country’s AI policies
  • Major AI networks, projects and players in the particular country
  • Data collection efforts and repositories in the country

Greece participates in the ELRC Consortium since 2015, with Stelios Piperidis, Head of the Natural Language Processing and Language Infrastructures (NLPLI) Department of ILSP/Athena RC, appointed as the Greek National Coordinator. Maria Gavriilidou, Senior Researcher at ILSP/Athena RC and Deputy National Coordinator of the CLARIN:EL Research Infrastructure for Language Resources and Technologies, and Maria Giagkou, Scientific Associate at ILSP/Athena RC, member of the ELRC Consortium and Project manager of the Action on CEF Automated Translation Core Service Platform (ELRC3) (service contract for the European Commission), have significantly contributed as authors to the ELRC White Paper 2022.

You can find and read the New edition of the ELRC White Paper AI for Multilingual Europe – Why Language Data Matters here.