In English

The Budapesti Egyetemi Kollégiumi Korpusz (Budapest University Dormitory Corpus), abbreviated as BEKK, is a Hungarian linguistic corpus which has been collected since 2015 in the frameworks of the BEKK research project.  It consists of recorded, transcribed and annotated spontaneous conversations among Hungarian university students who are originally from the countryside but study in the capital, Budapest, and live in dormitories (kollégium in Hungarian, belonging to both colleges and universities, Bachelor and Master’s studies).  The corpus is unique in offering almost 20 hours of spontaneous speech recorded in the dormitory rooms by the participants themselves.

The transcript of the corpus is being uploaded online for open access on this website. See a sample under the menu point “Mutatványok“.  The audio material will be available for research purposes, according to each participant’s consent.

Under “Publikációk“, the publications related to the BEKK project can be found, including some in English. You can find our contact details under “Kapcsolat“.

About the method

Budapest dormitories are typically located in high buildings with several floors, with several rooms along long corridors on each floor. One room is usually shared by 3–4 students of the same gender, between 18 and 23 years, who have been distributed into rooms randomly by the administration. It means that roommates might not even know each other when the first academic year starts, but they nevertheless spend most weekday nights together in the room,
and many of course become friends. Most students regularly leave the dormitory for
weekends when they visit their families in the countryside.

In order to access linguistic interactions with the least external monitoring possible, the BEKK project involved students from sociolinguistic classes as peer researchers, who contacted various dormitories across Budapest and organized students living there. The peer researchers asked these students to record any conversation taking place in their rooms without anyone involved in the research present, after the written consent of everyone staying in the room. Finally then, even if recorded in more than one piece, we received at least 3-hour-long records of (mostly same-gender) groups of 2–4, from 15 rooms of various dormitories, mostly done by the participants’ own mobile devices. The recordings were transcribed, manually annotated and pseudonymized in a software called Bihalbocs, which synchronizes the transcript with the respective audio excerpt. The annotation was carried out following the primary focus of the BEKK project, namely, the linguistic constructions of the categories of gender, sexuality and
ethnicity.

Furthermore, after the recording period, the peer researchers conducted structured group interviews with the dormitory participants. In these interviews running for 1–1.5 hours, students were asked to discuss dormitory life and language use, as well as short texts dealing with recent Hungarian hot cases connected to anti-Roma racism, sexual abuse on campus, and homophobia in connection with the Budapest Pride Marches.