Archive for November 2011

 
 

Statistical Language Models for Alternative Sequence Selection

by Herman Stehouwer

Is there a need to limit certain aspects of statistical language models?

Is it necessary to pre-limit the size of the n-gram?

Is it useful to use linguistic annotation, within alternative sequence selection tasks?

According to a new study by Herman Stehouwer, the size of the n-gram can be completely flexible depending on the situation. The study also finds that the addition of certain linguistic annotations, specifically part-of-speech annotations and dependency-parses, did not aid the model in making decisions.

The study compares the ability of a language model to select the correct alternative from sets of alternatives in hundreds of experiments. These experiments where performed for three different alternative sequence selection tasks, for four different annotations (and also for no annotation), and for four different ways to combine the annotation with the text. The results of the study have been used to write the thesis “Statistical Language Models for Alternative Sequence Selection”. This thesis will be defended on the 7th of December at 18:00 in the Aula of Tilburg University.

Coinciding with the defense a colloquium on language modeling is organized with invited talks by Colin de la Higuera, Louis ten Bosch, and Antal van den Bosch. For more information on the colloquium you can send an e-mail to herman.stehouwer [at] mpi.nl or look at its website.

The Language Archive officially launched

by Sebastian Drude

Tuesday, the 11th of October 2011, the new unit of the Max-Planck-Institute for Psycholinguistics “The Language Archive” (TLA) has been officially launched in a public event with more than 150 guests and speeches from eminent representatives from Germany and the Netherlands.

Many more showed up than expected: there were even not enough seats for all guests at the launching of TLA in the Headquarters of the Berlin-Brandenburgische Akademie der Wissenschaften (BBAW) at the Gendarmenmarkt in the center of Berlin. The BBAW is one of the three supporting institutions of TLA, together with the Dutch Koninklijke Nederlandse Akademie van Wetenschappen (KNAW) and the German Max-Planck-Gesellschaft (MPG).

The guests were presented with coffee and snacks, but before and above all with much content: five eminent representatives of the major stakeholders of the new unit gave fascinating talks discussing different topics, all related to the ongoing and future activities of TLA. These were on the one hand the respective representatives of the three supporting institutions: Wolfgang Klein for the MPG, Angelika Storrer for the BBAW, and Theo Mulder for the KNAW. On the other hand, Wilhelm Krull represented the Volkswagenstiftung, the funding agency that supports the programme “Documentation of Endangered Languages” (DOBES) since 2000, which in turn was represented by Nikolaus P. Himmelmann. The DOBES archive is in many respects the core of the archive hosted by TLA. After the talks, Paul Trilsbeek provided a look into the archive itself.

The full program and topics of the speeches

Begrüßung und Zielstellung für das Spracharchiv
Prof. Dr. Wolfgang Klein
Direktor am Max Planck Institut für Psycholinguistik

Sprachforschung und Sprachdokumentation im digitalen Zeitalter
Prof. Dr. Angelika Storrer
Zentrum Sprache der BBAW

E-science: a major challenge for the humanities
Prof. Dr. Theo Mulder
Forschungsdirektor der KNAW

Dokumentation bedrohter Sprachen – eine Aufgabe für Wissenschaft und Gesellschaft
Dr. Wilhelm Krull
Generalsekretär der VolkswagenStiftung

Wie die Sprachwissenschaft zur Empirie fand (und findet)
Prof. Dr. Nikolaus P. Himmelmann
Universität Köln

Blick ins Archiv
(interactive presentation)

The TLA Opening in the media: