Archive for June 2010

 
 

Embeddable Annex

by Thomas Koller

The MPI developers recently made a new Annex feature available which allows users to embed a smaller-sized customised version of Annex into any web page. This new feature has since then been warmly welcomed by researchers inside and outside our institute as it is a great way to easily show research results to outsiders.

The embeddable version of Annex only supports access to freely accessible annotation resource bundles, i.e. resource bundles which can be accessed from the IMDI browser without user login. This restriction helps to avoid authentication issues and effectively protects resources with restricted access.

This new feature can be accessed directly from Annex by clicking on embed in the menu. Then a small dialog pops up where the user can customise the HTML snippet before copying it to the clipboard and pasting it into a webpage. This works pretty much the same way as the similar YouTube feature which users may already be familiar with. The following options are available:

  • Show border around embedded Annex application: the creator can select a border width, a border color and a border type (solid, dotted or dashed)
  • Size of the embedded Annex application: 4 predefined sizes are available. The user can also set any custom sizes directly in the HTML markup. It should be noted, however, that the embedded Annex application has been optimised in layout and components sizes for the 4 predefined sizes. So any custom size set in the HTML snippet can lead to a non-optimal looking Annex instance.
  • Default view: text or subtitle. Setting a different default view (such as timeline or grid) will be ignored, instead the ‘text’ view will be set.
  • Tier text font: This setting may be helpful if the user wants the embedded Annex to display an annotation resource with special characters which may not be contained in a standard font on the user’s computer. If the ‘Tier text font’ parameter is set with a font name which is not available on the user’s computer, then the embedded Annex application will automatically fall back to a standard font. The end user also has the option to change the tier text font and the font size at any time via a dropdown list.

The embedded Annex application has a Start Full ANNEX button in its top right corner. When the end user clicks this button, a new browser tab will open the full Annex version showing the same annotation resource.

The CLARIN-NL metadata tutorial

by Dieter van Uytvanck

On Friday May 27, about 25 persons gathered in the Max Planck Institute in Nijmegen to attend a workshop on the practical use of the Component Metadata Infrastructure (CMDI) for the description of language resources. CMDI is the metadata part of CLARIN, a European initiative to create a Common Language Resources Infrastructure

After a short introduction about metadata in general and a history sketch, the concepts behind CMDI were introduced: The core ideas behind the new metadata format are modularity, reusability, and the use of data categories. A special session was dedicated to the use of ISOcat, the reference implementation of a data category registry. The idea behind this is to have a dependable definition of what is meant with a data category as, for example, Part of Speech. This way it doesn’t matter how you call or spell it in your particular metadata schema, the connection to similar schemata is always clear.

After these more general introductions, the specific CMDI software was presented.

First the Component Registry was shown. It is a web application that can be used for inspecting, searching, creating and editing CMDI metadata components. Afterwards it was illustrated how to create CMDI metadata files using a version of Arbil that has been modified to directly interact with the Component Registry. Both Arbil and the Component Registry are developed by the Max Planck Institute for Psycholinguistics and were presented by their respective developers. Although both applications are still in a development state it was clear that they can already be used now for the production of CMDI metadata.

All slides of the presentations can be downloaded from the CLARIN NL website.

More information about CMDI, including links to the software so you can try it out yourself, can be found on the main CLARIN site.

ANNEX and ELAN – A Comparison

by Thomas Koller and Han Sloetjes

ANNEX and ELAN are two closely related applications designed for handling of digital media files and associated annotation files. While ELAN as a desktop application is used for the creation of rich annotations on audio and video recordings, ANNEX represents a web-based viewer which allows to study annotated resources once they have been properly stored on the archive server.

This short article aims at highlighting on the one hand what features they have in common and on the other hand what features are unique to each tool.

ELAN is a local tool (desktop application) for the creation of annotations to audio and or video recordings. It is a combination of a media player with viewer and editor components for annotations. The annotation documents are stored in the XML-based ELAN Annotation Format (EAF). ELAN is written in the Java programming language and is available for Windows, Mac OS X and Linux. On Windows and Mac the media playback is delegated to an available high performance native media framework: DirectX/DirectShow on Windows and QuickTime on Mac. On Linux JMF is used. The list of supported file types depends on the available media player frameworks.

ELAN main window

Although there is limited support for streaming media via the RTSP protocol, most commonly the media files are accessed directly on a local hard drive or the local network. This guarantees high accuracy in media playback, especially in (repeated) playback of fragments of the media, which is usually a basic step in the process of segmenting the media. The annotation boundaries can be determined with millisecond precision. ELAN supports simultaneous, synchronized playback of up to 4 video files. The annotation documents are stored locally as well. The variant of the TROVA search engine that is distributed with ELAN can query the contents of physical directory structures. To that end it creates temporary in-memory indexes for the content of selected folders and files. The search is limited to EAF files. The ELAN window offers several customizable views on the annotation data, all synchronized with the media player. All viewers are editors at the same time. Many operations are provided for manipulating tiers and annotations.

ANNEX is written as an ELAN compliant browser-based tool (web application) that supports media playback via HTTP pseudostreaming and the Flash Player browser plugin. For freely accessible language resources ANNEX can also be embedded in any web page by pasting a simple HTML snippet into the page (comparable to the way Youtube supports embedding of videos into web pages). Alongside the media player it contains several customizable viewer components for annotations. By default both the media files and the annotation files are streamed from the MPI online archive; there is no need for downloading files in order to be able to view their contents. ANNEX is seamlessly integrated with the archive access management tools and interacts with available web services, for example the ones exposed by the lexicon tool LEXUS. Other tools in turn can make parameterized calls to ANNEX.

ANNEX works with the online version of TROVA, which creates an index for a whole LAMUS archive using the Postgres database system. This version of TROVA supports not only EAF but also Shoebox, CHAT and generic XML, HTML and text files.

Comparison Matrix
Feature ANNEX ELAN
Number of synchronized videos 1 4
Media file types MPG for video files, WAV for audio files Depending on the media framework of the particular platform
Waveform for audio .wav only .wav only
Media playback precision Depends on keyframe rate milliseconds
Streaming media support Pseudostreaming for audio and video files Limited, via rtsp
Annotation formats EAF, Toolbox/Shoebox, Chat. Will be converted to a single XML format for transfer. EAF, import of Toolbox, Chat, Praat, Transcriber, CSV
Annotation editing No Yes
Number of tiers Unlimited Unlimited
Font usage Any font available on the system Any font available on the system supported by Java
Search options TROVA search engine, search in entire (accessible part of) archive Single file search and multiple file search (TROVA) in local corpus
Technology Flash, XML, Quicktime (temporarily for resources with master audio file) Java, XML
Tool interaction, API Support for parameterized calls to ANNEX Extension mechanism for particular parts of the application