In the AVATecH project we are currently ready to share our initial results with the research community. The first recognizers are tested by MPI researchers and their valuable feedback is recorded in order to help us further improve our work and deliver tools that can save a lot of researchers’ time.
In order to spread the word about AVATecH and get more researchers interested, we have created this short movie clip that introduces the principal ideas of the project and shows some of our results.
The video is in German. English subtitles should be shown automatically, if not click on the little CC at the bottom.
The CLARA Summer School on Infrastructure Tool Development has taken place at Max Planck Institute for Psycholinguistics on 5th – 12th July.
Participants came from several institutions, including the University of Bielefeld, the Technical University of Aachen, Gießen University or Technical School of Mittelhessen. Some representatives of Max Planck staff also participated in parts of the summer school, especially those requiring less technical expertise. Altogether they have created a very inspiring and productive group that managed to carry out the tasks planned for the event and also came up with some new ideas for developing useful things, which also have been done during the summer school.
On the first day Przemek Lenkiewicz opened the summer school and introduced participants to the agenda and all extra activities. Participants were also encouraged to present themselves and their work, giving an idea about how they use ELAN and what are they hoping to learn at this event.
Later Han Sloetjes, the main developer of ELAN, has presented the annotation tool and introduced its mechanisms for creating and integrating extensions (recognizers). Some users said that although they have used ELAN for quite a long time, they were not even aware that it is possible to extend its functionality and that it is so simple. Han has spent the whole day with participants to clear out any doubts they might have. He also showed up on following days and participated in the development sessions.
Stefano Masneri with participants
Days 2-4 of the event were about signal processing techniques. Stefano Masneri of Fraunhofer HHI Berlin and Dr. Rolf Bardeli of Fraunhofer IAIS Sankt Augustin have introduced the participants to video and audio processing basics. In the afternoon hands-on sessions participants have developed some simple video/audio processing algorithms, like histogram calculations for both audio and video, color-to-greyscale conversion, image flipping, etc. But also more advanced functionality was developed, like detecting a person’s hand in a video using edge detector as the base or detecting fricatives in a speech recording using thresholding.
The last two days of the summer school were led by Przemek Lenkiewicz and Eric Auer. In a brainstorming session with the participants we defined two recognizers, which are interesting for them to develop. Those included automated importing of eye-tracking data into ELAN and representing it as annotations and curves, and also a recognizer to compare two tiers based on the similarity of the annotations. Both recognizers have been successfully developed until the end of the summer school.
Przemek Lenkiewicz and Eric Auer
Since the summer school included the weekend, the group met and explored Nijmegen for a while. On Monday July 11th we also had dinner together in a nice Dutch restaurant.
Additional pictures from the event can be found on this web page.
After the event participants have filled a survey and rated the summer school very well for a good content, good way to deliver it and for overall organization. Considering the good feedback, another Summer School on Infrastructure Tool Development might take place at Max Planck in summer 2012. All interested in participating should contact Przemek Lenkiewicz about it.
The AVATecH project is an interesting initiative of the Max Planck Gesellschaft and Fraunhofer Gesellschaft. It aims at developing solutions that would allow creation of automated annotation for media recorded by linguistic researchers, therefore it has been seen as something highly desired and the expectations are high.
The project has recently passed two very important milestones. The first one has happened in November, when the AVATecH Expert Workshop took place. For two days the participants of the project have interacted with each other and with the potential users of their solutions, in order to present what is the status of the development and integration of their work and to get feedback and further suggestions from the linguists. Also experts from different fields have been present (audio/video processing, gesture and sign language research, field researchers) to see the status of work and to get an idea about what can be soon available for their purposes. Naturally they contributed numerous valuable comments.
After the status of work has been presented and suggestions have been gathered, all the project participants have worked on their solutions and another important point of the project has been reached, which was to deliver the first automated annotation functionality to the ELAN tool and make it available for Max Planck researchers. This functionality covers these initial possibilities:
The audio part aims at providing some functionality that takes place in major part of the annotations. This would be: detecting how many persons are speaking in the audio recording and create appropriate number of tiers; detect who is speaking when and create annotations for that at appropriate parts of the recording; align the recording with transcription from a text file.
The video part provides the following functionality: detecting shots and subshots in the recording; creating representative keyframes for given shots the subshots; estimating the color ranges that represent human skin in the recording; tracing the position of hands and head of the speaker. Further functionality will be built on top of the last mentioned recognizer, namely the position of the hands and head will be taken into account and together with time information they will serve to estimate the speed of hands movement, their relation to each other and to the speaker’s body, etc.
The MPI team is currently working on integrating these features with ELAN and providing manuals for researchers on how to use them.
Toward the end of last year a new version of ELAN has been released, containing lots of new features and improved functionalities, a new media player solution for Windows and fixes for a number of issues and bugs in previous versions.
A first implementation of interaction with LEXUS, the MPI developed web-based lexicon tool for creating and editing lexical databases, has been added. A new lexicon viewer allows the user to perform a look up for values in an online lexicon and to apply a value to the selected annotation.
ELAN has been facing many codec related problems, especially with mpeg-1 and mpeg-2 files. With the intention to eliminate a few of them, a new player, for Windows has been developed based on DirectShow (JDS, Java-Direct Show).
To use this player, it is necessary to select it first in the Platform/OS tab in the “Edit Preferences” window.
This version extends its support for controlled vocabularies with externally defined closed controlled vocabularies (located e.g. on the web). The list of supported file formats for importing controlled vocabularies has been extended with .txt and .csv. The file format of externally defined closed controlled vocabularies files is .ecv, which is close to eaf.
To make life easier and to increase the work speed of ELAN users, several improvements have been made to get things done with fewer steps and clicks. A few tier-based operations, like removing multiple annotations or annotation values from selected tiers or creating depending annotations recursively on all depending tiers, can be performed much faster and with more ease of use. Now it is also possible to automatically create depending annotations, when an annotation is created on a tier with dependent tiers. The merge transcriptions function is extended with options for appending one file to the other, making the merging process more versatile.
Further support for audio and video recognizers, as developed in e.g. the AVATecH Project, has been implemented. To learn more about this project, visit the AVATecH website.
You can download the new version at the ELAN web site where you will also find the updated manual detailing how to use the new functionalities.
The AVATecH project (Advancing Video Audio Technology in Humanities Research) aims at investigating, developing and applying advanced technology for semi-automatic annotation of collected audio-visual recordings used in humanities research. Currently, even the simplest annotations of, for example, recorded dialogs take too much time and effort. By making the annotation process more efficient through the use of automatic detectors, more data can be annotated more efficiently, allowing new possibilities for search and corpus analysis and better theory building.
Initial research will focus on the creation of detector components which, given media recordings, generate lists of segments and annotations. Such detectors can be invoked from within annotation tools such as the widely used and proven ELAN software and from a batch-processing framework, to process a number of recordings in one go.
The project is organized in two major phases:
1. First, low hanging fruit detectors will be identified that can operate on a selected collection of typical audio/video material. They will be integrated into ELAN and so that the developers can interact with researchers during the evaluation.
2. Second, more advanced and complex detector tasks will be tackled after the results of the low hanging fruit detectors have been evaluated.
Head and Hands Tracking
The detectors developed will be made available via interactive annotation tools and batch processing. In this project, two Max Planck Institutes (the MPI for Psycholinguistics in Nijmegen and the MPI for Social Anthropology in Halle) and two Fraunhofer Institutes (the Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS in Sankt Augustin and the Fraunhofer Heinrich Hertz Institute HHI in Berlin) are cooperating in different capacities. The Max Planck Institutes act as experts for the research driven questions resulting from an analysis of the AV material and for user-friendly interaction tools. The Fraunhofer Institutes act as experts for digital sound and video processing methods. More information on AVATecH can be found on the project’s homepage.
A lot of compression algorithms in current video codecs are lossy, meaning that they throw away information that is seen as less relevant for the perception of the image in order to reduce bandwidth. This information cannot be reconstructed afterwards which is one of the reasons why archives like ours do not like to use lossy compression formats, since one never knows whether the information that is thrown away might be relevant for future use of the data. Another reason is that every time compression is applied to an already lossy compressed signal, the signal degrades. This happens both when using the same compression algorithm as well as when transcoding to a new (future) compression standard. Since codecs and file formats generally have a limited lifetime, this would mean that if an archive wants to keep their archived content interpretable in the long run, it will degrade over time because of the necessary conversion steps to the latest state of the art.
For these reasons, archives generally want to preserve data in as uncompressed a form as possible. The rapid deterioration of physical audiovisual carriers such as celluloid film and many deprecated video formats have triggered broadcast and film archives but also the film industry to start massive digitization projects. Due to the high costs of these operations and the high economical or cultural value of the material, it makes sense to store this digitized material in the highest possible quality because even if it were at all possible to repeat such digitization operations in the future (e.g. to account for new compression standards), it would be a big waste of time and money. The storage costs for these formats are substantial at this moment but will decrease with the introduction of newer, higher capacity storage technology.
A codec that is being widely used at the moment in the film and video archiving world is Motion JPEG 2000. This codec allows for lossless – i.e. reversible – compression of moving images. It is also used as the standard for Digital Cinema, albeit currently in a lossy variant. In the lossless variant, compression ratios of about 1:2 can be achieved, but the main reason for moving to lossless or uncompressed storage of video is, as stated, to prevent future degradation of the signal if current codecs become obsolete.
The MPI is currently digitizing a large collection of valuable videotapes from the German behavioral scientist Irenäus Eibl-Eibesfeldt. These recordings were originally made on 16mm film and have been transferred to Betacam SP (broadcast quality) video, which was a costly and time-consuming process. The Betacam SP tapes are now being digitized in lossless MJPEG2000 format in order to retain the highest possible quality for this material. In addition, MPEG2 and H.264 distribution copies are created to make the material more accessible over lower-bandwith data connections.