
The aim of the CLARIN Resource Families initiative is to provide a user-friendly overview of the available corpora in the CLARIN Infrastructure for researchers from digital humanities, social sciences and human language technologies. The overviews are organized according to the types of data in the corpora and include listings of corpora sorted by language. CLARIN currently offers overviews of 7 resource families:
- Computer-mediated communication corpora
- Historical corpora
- L2 learner corpora
- Newspaper corpora
- Parallel corpora
- Parliamentary corpora
- Spoken corpora
In the future, CLARIN plans to include other resource families, such as manually annotated corpora, as well as add tutorials on how to query, annotate and analyse the data.
The overviews have been prepared by Darja Fišer and Jakob Lenardič and have received funding from the European Union's Horizon 2020 research and innovation programme for projects CLARIN-PLUS and PARTHENOS. CLARIN would like to thank all the User Involvement coordinators, National Coordinators, workshop participants and other individuals who have participated in the survey and have provided information about the resources.
Computer-mediated communication corpora
- Xenophobia - Verbal Aggressiveness Database (Target Group: Muslims/Islam)
- Xenophobia - Verbal Aggressiveness Database (Target Group: Albanians)
- Xenophobia - Verbal Aggressiveness Database (Target Group: Jews)
- Xenophobia - Verbal Aggressiveness Database (Target Group: Roma)
- Xenophobia - Verbal Aggressiveness Database (Target Group: Pakistani)
- Xenophobia - Verbal Aggressiveness Database (Target Group: Romanians)
- Xenophobia - Verbal Aggressiveness Database (Target Group: Germans)
- Xenophobia - Verbal Aggressiveness Database (Target Group: Syrians)
- Xenophobia - Verbal Aggressiveness Database (Target Group: Immigrants)
- Xenophobia - Verbal Aggressiveness Database (Target Group: Refugees)
- Xenophobia - Event Database
Historical corpora
L2 learner corpora
Newspaper corpora
- ACCURAT corpus of comparable sentences
- QTLP Greek CC Corpus for the Medical Domain
- QTLP Greek Corpus for the Medical Domain
- QTLP Greek Corpus for the Automotive Domain
- Parallel Global Voices
- SETIMES - A parallel corpus of the Balkan languages
- QTLP Portuguese-Greek Corpus for the AUTOMOTIVE domain
- QTLP English-Greek Corpus for the AUTOMOTIVE domain
- QTLP English-Greek Corpus for the MEDICAL domain
- Hellenic National Corpus
- Theodoros Pangalos' Articles in the Greek Newspaper "To Vima"
- Corpus "Library and Information Centre - Newspapers"
- Modern Greek Texts Corpus - "Makedonia" newspaper
- Modern Greek Texts Corpus - "Ta Nea" newspaper
Parallel corpora
- Parallel corpus newsletters IFT FR-GR
- Europarl Parallel Corpus
- JRC-Acquis Multilingual Parallel Corpus
- MLCC Multilingual and Parallel Corpora
- MUSA Multilingual Multimodal Corpus
- PANACEA English-French and English-Greek parallel corpus
- PELCRA mutlilingual parallel corpora
- SETimes
- REVEAL-THIS Corpus
- ACCURAT balanced test corpus for under resourced languages
- UP/TAP
- European Parliament Proceedings Parallel Corpus 1996-2011, parallel corpus Greek-English
- EMEA Corpus
- ECDC Translation Memory
- DGT-Translation Memory
- DGT-Acquis
- EAC Translation Memory
- A parallel corpus collected from the European Constitution
- A parallel corpus of KDE4 localization files (v.2)
- European Central Bank parallel corpus
- OpenSubtitles2011
- SPC - Stockholm Parallel Corpora
- DGT-TM-2016
- OpenSubtitles
- QTLP English-Greek Corpus for the MEDICAL domain
- QTLP German-Greek Corpus for the MEDICAL domain
- QTLP Portuguese-Greek Corpus for the MEDICAL domain
- QTLP English-Greek Corpus for the AUTOMOTIVE domain
- QTLP Portuguese-Greek Corpus for the AUTOMOTIVE domain
- The JRC-Acquis Corpus, version 3.0
- FREL
- Text Corpus - EMEL
- Inlerlingual Perspectives
- aformes
- GLOSSOLOGIA
- Modern Greek Grammars (17th c. - middle 20th. c.)
- Civitas Gentium
- tekmirion. (issue 5)
- Conference Proceedings "Creative Writing" 1
- Conference Proceedings "Creative Writing" 2
- Official Journal of the European Union_oj4-ps-1
- Official Journal of the European Union_oj4-fd-2
- Official Journal of the European Union_oj4-pc-1
- Official Journal of the European Union_oj4-ss-1
- DICTA-SIGN corpus
- INTERA Corpus - the Greek-English part
- Greek-Bulgarian Bul-TM parallel corpus