Summary of the 2011 CLARA Summer School

by Przemek Lenkiewicz

The CLARA Summer School on Infrastructure Tool Development has taken place at Max Planck Institute for Psycholinguistics on 5th – 12th July.

Participants came from several institutions, including the University of Bielefeld, the Technical University of Aachen, Gießen University or Technical School of Mittelhessen. Some representatives of Max Planck staff also participated in parts of the summer school, especially those requiring less technical expertise. Altogether they have created a very inspiring and productive group that managed to carry out the tasks planned for the event and also came up with some new ideas for developing useful things, which also have been done during the summer school.

On the first day Przemek Lenkiewicz opened the summer school and introduced participants to the agenda and all extra activities. Participants were also encouraged to present themselves and their work, giving an idea about how they use ELAN and what are they hoping to learn at this event.

Later Han Sloetjes, the main developer of ELAN, has presented the annotation tool and introduced its mechanisms for creating and integrating extensions (recognizers). Some users said that although they have used ELAN for quite a long time, they were not even aware that it is possible to extend its functionality and that it is so simple. Han has spent the whole day with participants to clear out any doubts they might have. He also showed up on following days and participated in the development sessions.

Stefano Masneri with participants

Days 2-4 of the event were about signal processing techniques. Stefano Masneri of Fraunhofer HHI Berlin and Dr. Rolf Bardeli of Fraunhofer IAIS Sankt Augustin have introduced the participants to video and audio processing basics. In the afternoon hands-on sessions participants have developed some simple video/audio processing algorithms, like histogram calculations for both audio and video, color-to-greyscale conversion, image flipping, etc. But also more advanced functionality was developed, like detecting a person’s hand in a video using edge detector as the base or detecting fricatives in a speech recording using thresholding.

The last two days of the summer school were led by Przemek Lenkiewicz and Eric Auer. In a brainstorming session with the participants we defined two recognizers, which are interesting for them to develop. Those included automated importing of eye-tracking data into ELAN and representing it as annotations and curves, and also a recognizer to compare two tiers based on the similarity of the annotations. Both recognizers have been successfully developed until the end of the summer school.

Przemek Lenkiewicz and Eric Auer

Since the summer school included the weekend, the group met and explored Nijmegen for a while. On Monday July 11th we also had dinner together in a nice Dutch restaurant.

Additional pictures from the event can be found on this web page.

After the event participants have filled a survey and rated the summer school very well for a good content, good way to deliver it and for overall organization. Considering the good feedback, another Summer School on Infrastructure Tool Development might take place at Max Planck in summer 2012. All interested in participating should contact Przemek Lenkiewicz about it.



Detectar idioma » English

The CLARA Project

by Przemek Lenkiewicz

Recently the Max Planck Institute started its participation in a very interesting project called CLARA. The name stands for Common Language Resources and their Applications. It is a European project that runs under the Initial Training Network framework of the Marie-Curie Actions.

CLARA offers posts for researchers both PhD and postdocs. The project will train a new generation of researchers who will be able to cooperate across national boundaries on the establishment of a common language resources infrastructure and its exploitation for the construction of the next generation of language models with wide theoretical and applied significance. The work of CLARA researchers will focus around two main goals:

  • to develop the next generation of data-intensive language models and applications by integrating approaches across language and country boundaries;
  • to contribute to the establishment of a pan-European infrastructure for language resources.

Recent advances in technology and widespread research efforts have expanded the size of corpora and the extent of their annotations. From corpora as basic resources, other resources are being derived, e.g. lexicons, frequency lists, word nets, term banks, etc. Although a large number of language resources have been produced to date, many scientific and organizational challenges remain, including the following:

  • Theories and modeling approaches have not yet been applied on a wide range of languages;
  • The gap between academic models and the needs of industrial actors who aim at real life applications remains to be bridged;
  • There is a lack of appropriate documentation for many resources. Moreover there is no good overview of available resources for some European languages;
  • Since some resources are developed for specific purposes, there is a challenge to convert them so they can be reused for other purposes;
  • The long term preservation of language resources needs to be secured;
  • Efficiency issues in accessing language resources in very large repositories must be addressed.

These challenges are meant to be addressed by CLARA researchers by means like:

  • further work on standardization of coding and annotation practices;
  • development of registries and documentation systems for language resources;
  • transfer and integration of single-purpose resources to interoperable, reusable and extendable forms.

The Max Planck Institute is hosting three researchers of the CLARA project, two PhDs and one postdoc. Their work will be organized as contribution to the AVATecH project, which aims at developing methods for automated annotation creation and thus addresses the areas of interests of the CLARA project.

People involved:
Peter Wittenburg – Scientist in charge.
Perry Janssen – Administrative contact.
Przemek Lenkiewicz – Experienced Researcher, Scientific contact.
Hugo García Blanco – Early Stage Researcher.
Binyam Gebrekidan Gebre – Early Stage Researcher.

The International CLARA Summer School

by Thomas Koller

The Max Planck Institute for Psycholinguistics is proud to offer an international CLARA summer school on ”Advanced Resource Creation, Archiving and Usage” in Nijmegen (Netherlands). The summer school topics will be taught by experienced external specialists and MPI experts. It will take place at the Institute from July 5th to July 16th, 2010.

The summer school is part of the European CLARA project (Common Language Resources and their Applications). CLARA is a Marie Curie Initial Training Network which aims to offer early-stage researchers the opportunity to improve their research skills, to join established research teams and to enhance their career prospects.

Participating in this summer school will allow young researchers to get a deep understanding of modern methodologies and technologies to create, archive and use sharable language resources. The aim is to train young researchers in how to use modern technology to create language resources, in particular when the source material are multimedia streams. Additionally they will learn how the resulting complex resource types can be archived, how they can be accessed and analyzed via state-of-the-art (web) applications and how they can be enriched. 

The CLARA summer school has already attracted a varied and interesting group of young researchers and is fully booked out.

More information on the CLARA summer school can be found at the MPI website.