Archive for the Category Workshops

 
 

Summary of the 2011 CLARA Summer School

by Przemek Lenkiewicz

The CLARA Summer School on Infrastructure Tool Development has taken place at Max Planck Institute for Psycholinguistics on 5th – 12th July.

Participants came from several institutions, including the University of Bielefeld, the Technical University of Aachen, Gießen University or Technical School of Mittelhessen. Some representatives of Max Planck staff also participated in parts of the summer school, especially those requiring less technical expertise. Altogether they have created a very inspiring and productive group that managed to carry out the tasks planned for the event and also came up with some new ideas for developing useful things, which also have been done during the summer school.

On the first day Przemek Lenkiewicz opened the summer school and introduced participants to the agenda and all extra activities. Participants were also encouraged to present themselves and their work, giving an idea about how they use ELAN and what are they hoping to learn at this event.

Later Han Sloetjes, the main developer of ELAN, has presented the annotation tool and introduced its mechanisms for creating and integrating extensions (recognizers). Some users said that although they have used ELAN for quite a long time, they were not even aware that it is possible to extend its functionality and that it is so simple. Han has spent the whole day with participants to clear out any doubts they might have. He also showed up on following days and participated in the development sessions.

Stefano Masneri with participants

Days 2-4 of the event were about signal processing techniques. Stefano Masneri of Fraunhofer HHI Berlin and Dr. Rolf Bardeli of Fraunhofer IAIS Sankt Augustin have introduced the participants to video and audio processing basics. In the afternoon hands-on sessions participants have developed some simple video/audio processing algorithms, like histogram calculations for both audio and video, color-to-greyscale conversion, image flipping, etc. But also more advanced functionality was developed, like detecting a person’s hand in a video using edge detector as the base or detecting fricatives in a speech recording using thresholding.

The last two days of the summer school were led by Przemek Lenkiewicz and Eric Auer. In a brainstorming session with the participants we defined two recognizers, which are interesting for them to develop. Those included automated importing of eye-tracking data into ELAN and representing it as annotations and curves, and also a recognizer to compare two tiers based on the similarity of the annotations. Both recognizers have been successfully developed until the end of the summer school.

Przemek Lenkiewicz and Eric Auer

Since the summer school included the weekend, the group met and explored Nijmegen for a while. On Monday July 11th we also had dinner together in a nice Dutch restaurant.

Additional pictures from the event can be found on this web page.

After the event participants have filled a survey and rated the summer school very well for a good content, good way to deliver it and for overall organization. Considering the good feedback, another Summer School on Infrastructure Tool Development might take place at Max Planck in summer 2012. All interested in participating should contact Przemek Lenkiewicz about it.



Detectar idioma » English

Metadata Workshop

by Dieter van Uytvanck

On September 7 and 8 a workshop was organized at the MPI in Nijmegen about the use of metadata within European research infrastructures. Representatives from a broad range of fields (ranging from high-energy physics over biodiversity to linguistics) gathered to explain what their particular views on metadata are.

It became soon clear that although the differences between closely related disciplines can be overcome, there are huge gaps between others. While in the humanities area the metadata generally is carefully hand-crafted, this is completely infeasible for the enormous amounts of data resulting from sensors in the physics world.

Despite all the differences between the communities some common goals for the future were identified. Among them the need to build an infrastructure using re-usable metadata components and access to shared ontologies and vocabularies.

Bringing together all conclusions of the workshop, a document was authored, meant as the basis of a proposal towards the European Commission for collaboration on the field of metadata. This can be found here.

More information and the presentations of both days are available at the workshop’s website.

RELISH workshop on lexicon standards and lexicon tools

by Jacquelijn RIngersma

On August 4 and 5, the RELISH project organized a workshop on lexicon standards and lexicon tools at the MPI in Nijmegen. The workshop brought together field linguists and NLP experts to discuss the approaches, standards, tools and interoperability of lexical resources. The aim of the workshop was to create understanding on the requirements in lexicon tools and to design concrete steps towards further harmonization if possible.

In the RELISH project (Rendering Endangered Languages lexicons Interoperable through Standards Harmonization), funded by NEH and DFG, the MPI works together with The University of Frankfurt and the Eastern Michigan University. The project aims to unify two major collections of digitized lexicons of endangered languages in order to create a searchable virtual archive.

In the workshop, there were presentations from field linguists and from members of the NLP community. The presentations showed that there is some difference in focus and approach. Where the field linguist aims at a content rich resource which can be used both for research purposed and for disseminations to the speech community, NLP searches for an infrastructure covering “all” language resources and tools. As a logic result standardization and interoperability seem to be more important for the NLP society, although certainly not irrelevant for the field linguist. However, the information sharing on the subject of standards and interoperability was felt to be very useful by both ‘parties’.

In the workshop there were also presentations on LMF and ISOcat (the ISO standards for lexical resources) and LIFT and GOLD (the USA standards for lexical resources). The presentations and interactions showed that on both sides of the Atlantic interesting moves have been made towards standardization and that the difference between the two does not seem to be as wide as the mentioned ocean.

In the final 6 months of the RELISH project the parties involved will work on bridging the gap between LMF/ISOcat and LIFT/GOLD and develop an interchange format. Since RELISH brings together organizations that have been instrumental in promoting both endangered languages documentation and standards-development in Europe and the US, the success of RELISH will provide impetus for other standards-harmonization efforts, as well as offer the scientific research community integrated access to important new digital materials.

Presentations of the workshop are available from the Event page on the MPI website.

The International CLARA Summer School

by Thomas Koller

The Max Planck Institute for Psycholinguistics is proud to offer an international CLARA summer school on ”Advanced Resource Creation, Archiving and Usage” in Nijmegen (Netherlands). The summer school topics will be taught by experienced external specialists and MPI experts. It will take place at the Institute from July 5th to July 16th, 2010.

The summer school is part of the European CLARA project (Common Language Resources and their Applications). CLARA is a Marie Curie Initial Training Network which aims to offer early-stage researchers the opportunity to improve their research skills, to join established research teams and to enhance their career prospects.

Participating in this summer school will allow young researchers to get a deep understanding of modern methodologies and technologies to create, archive and use sharable language resources. The aim is to train young researchers in how to use modern technology to create language resources, in particular when the source material are multimedia streams. Additionally they will learn how the resulting complex resource types can be archived, how they can be accessed and analyzed via state-of-the-art (web) applications and how they can be enriched. 

The CLARA summer school has already attracted a varied and interesting group of young researchers and is fully booked out.

More information on the CLARA summer school can be found at the MPI website.

The CLARIN-NL metadata tutorial

by Dieter van Uytvanck

On Friday May 27, about 25 persons gathered in the Max Planck Institute in Nijmegen to attend a workshop on the practical use of the Component Metadata Infrastructure (CMDI) for the description of language resources. CMDI is the metadata part of CLARIN, a European initiative to create a Common Language Resources Infrastructure

After a short introduction about metadata in general and a history sketch, the concepts behind CMDI were introduced: The core ideas behind the new metadata format are modularity, reusability, and the use of data categories. A special session was dedicated to the use of ISOcat, the reference implementation of a data category registry. The idea behind this is to have a dependable definition of what is meant with a data category as, for example, Part of Speech. This way it doesn’t matter how you call or spell it in your particular metadata schema, the connection to similar schemata is always clear.

After these more general introductions, the specific CMDI software was presented.

First the Component Registry was shown. It is a web application that can be used for inspecting, searching, creating and editing CMDI metadata components. Afterwards it was illustrated how to create CMDI metadata files using a version of Arbil that has been modified to directly interact with the Component Registry. Both Arbil and the Component Registry are developed by the Max Planck Institute for Psycholinguistics and were presented by their respective developers. Although both applications are still in a development state it was clear that they can already be used now for the production of CMDI metadata.

All slides of the presentations can be downloaded from the CLARIN NL website.

More information about CMDI, including links to the software so you can try it out yourself, can be found on the main CLARIN site.

LEXUS and ViCoS: a software ‘couple’ in the LAT suite

by Jacquelijn Ringersma

LEXUS is our online tool for the creation of multimedia lexica and encyclopedic dictionaries. LEXUS is targeted at linguistics involved in language documentation, but also actively used by researchers in Sign Language research. LEXUS is based on the ISO recommendation for Language Resource Management (ISO TC37/SC4), providing a Lexical Markup Framework (LMF) lexicon structure and a concept naming registry (ISOcat). With LEXUS, users can create lexica from scratch, but also import lexica created in Toolbox or other XML based tools. Lexica using LMF and ISOcat are interoperable with each other, allowing for multi lexicon searches and merging of lexica. Users may customize views of the word list and lexical entries. Standard functionality, like sorting or filtering of word lists is already available and we are currently working on paper output options. One of the major strengths of the online tool is that users may share their lexica with other users, either on a read only or read/write basis.

ViCoS is an extension of LEXUS, with which users can create relations between lexical entries, using fuzzily defined relation types. The result of this network of relations can be a conceptual space, where each word is represented as an element in a network of other related words. Relations can be ‘universal’ (e.g. A_is_a_B) or specifically defined for a particular lexicon (A_eats_B). In its current version ViCoS can only be used from the LEXUS user interface, since the words are the basis of the conceptual space. Future plans for ViCoS envisage that the tool will be central in the creation of a customized ‘eScience environment’, a user-defined workspace where users can link any type of resource into new organizational layers.

LEXUS and ViCoS training and support

Recently we did a LEXUS/ViCoS training session in the Winter School Saami Language Documentation and Revitalization in Bodø, Norway. Some 25 participants were trained in creating lexica, adding multimedia fragments, customizing views and creating conceptual spaces. Although the training was basic and could not cover the full functionality of LEXUS and ViCoS, most users were enthusiastic about the tools and registered as LEXUS users after the training.

At the Saami Winter School (photo by Lena Karvovskaya)

If you are interested in using the tools, you may request a LEXUS user account by sending an e-mail to Jacquelijn Ringersma. We have regular LEXUS and ViCoS training in the DoBeS training weeks, or in summer schools and language documentation workshops.

Archiving workshop in India

by Jacquelijn Ringersma

From February 5 to February 8, there was a workshop on documentation and archiving in Guwahati, Assam (India). 22 participants were trained in the recording of audio and video, handling of audio and video files, and use of the LAT software. Two members of the MPI’s technical group were among the workshop trainers.

Participants trying out the video equipment

The archiving workshop was organised by DoBeS, in collaboration with Guwahati University and the Phonogrammarchiv (Austria). Its purpose was to train local linguists in best practices and current methods of documenting languages and cultures. The workshop was financed by the Volkwagen foundation, within the framework of the DoBeS project: The Traditional Songs and Poetry of Upper Assam.

Strengthen local capacity

The project aims at multifaceted linguistic and ethnographic documentation of the Tangsa, Tai and Singpho communities in Margherita (North-East India). The Guwahati workshop contributes to the project by strengthening the local capacity. Among the trainees were students and PhD’s of the Guwahati University and staff members of the National Folklore Support Centre (NFSC).