From the parliamentary benches to CLARIN:EL
On the occasion of the International Day of Parliamentarism (30 June 2023), we dedicate this month to resources which are hosted in the CLARIN:EL Infrastructure and constitute members of the Parliamentary Discourse Resource Family.
Maria Gavriilidou (Scientific Responsible of the Athena RC Repository at CLARIN:EL) and Dimitris Gkoumas (Scientific Associate of CLARIN:EL), talk about 2 resources which have been developed by the ILSP/Athena RC and concern Parliamentary Discourse.
The first resource, Greek Parliament Plenary Sessions (1989-2019), is a collection of the raw minutes of the Greek Parliament plenary sessions of the last 30 years. The corpus includes more than 1.000.000 speeches in TXT format. In the CLARIN:EL Infrastructure, users can also find access to smaller subsets of this corpus corresponding to specific parliamentary periods, in order to be able to isolate the historical periods, or process the subcorpora using the Infrastructure's web services. Access to this resource is freely available for research purposes through CLARIN:EL under a CC-BY License (Attribution). Users can also visit the page here to study an interactive visualisation of the topics that were of interest to the Greek Parliament during the period 1989-2015. Through this visualisation, users can search for the topics that were of interest to the Greek Parliament during the various periods, download the corresponding correlated speeches, find the speakers and the corresponding parties. Finally, users can also search for topics by date.
The second resource was developed in the framework of the ParlaMint project. ParlaMint is financially supported by CLARIN ERIC, and contributes to the creation of comparable and uniformly annotated multilingual corpora of parliamentary sessions of different European countries. The goal of the ParlaMint project has been to turn the existing contemporary diverse national parliamentary data into accessible data, characterised by direct correspondence to the most recent events, including the ones with global impact on human health, social life and economics, such as the current COVID-19 pandemic. Their presentation in a cross-linguistic context under common taxonomies (legislatures, organisations, nominal entities, types of speakers, morphosyntactic dependencies), allowing for diachronic and synchronic study, enables researchers and the general public to track developments and discussion on a pan-European level.
For the creation of the Greek corpus (ParlaMint-GR), data from 1.1.2015 to 1.2.2022 were used. The material is organised into parliamentary periods, each of which has regular sessions. Each session is in turn divided into one or more sittings (e.g. morning/afternoon). The data have been structured on the basis of speeches by Members of Parliament, which were identified by automatic procedures.
Specific metadata, as well as structural and linguistic annotation, are provided for each speech. The metadata include the name of the speaker, his or her gender, party affiliation (and party change, if any), role (e.g. prime minister, minister, party chairman, speaker of parliament, etc.). For each government, the corresponding government term is provided, i.e. the start/end date of the term, its ministers, their respective ministries with the relevant dates, and any resignations or deletions. Both the addition of metadata and annotation were done automatically, using tools developed at the ILSP/Athena RC.
This resource is available through the Slovenian National Repository, which as project coordinator is responsible for its publication. In particular, the resource without the linguistic annotations is available here and the resource with the linguistic annotations is available here.

Maria Gavriilidou
Researcher at ILSP/Athena RC
Deputy Director of CLARIN:EL Infrastructure & Scientific Responsible of Athena RC Repository at CLARIN:EL
Resource information
Greek
Preview
