|
|
2002
Up one level
-
LREP: A Language Repository Exchange Protocol
-
The recent increase in the number and complexity of the language resources available on the Internet is followed by a similar increase of available tools for linguistic analysis. Ideally the user does not need to be confronted with the question in how to match tools with resources. If resource repositories and tool repositories offer adequate metadata information and a suitable exchange protocol is developed this matching process could be performed (semi-) automatically.
-
Metadata Set and Tools for Multimedia/Multimodal Language Resources
-
Within the ISLE Project about International Standards for Language Engineering the IMDI Metadata Initiative developed a complete environment for creating, maintaining and using metadata descriptions for multimedia/multimodal language resources. This environment includes a proposal for a suitable metadata set, tools to create, browse and search in IMDI metadata domains and suggestions about how to organize centers acting as metadata repositories. By using the IMDI approach a formulation in RDF is intended which enable the IMDI set to be integrated in Semantic Web activities.
-
Cross-Linguistic Studies of Multimodal Communication
-
Gestures are culture specific forms of arm movements which are used in communication to transfer information to the listener, to guide the planning of the speech production process and to disambiguate the incoming speech. To understand the underlying mechanisms gestures have to be analyzed in cross-linguistic processes. Large projects are necessary covering speakers from various cultural background and many recordings. Such projects can only be successfully carried out, when suitable gesture encoding schemes, generic annotation schemes, powerful tools supporting the schemes and efficient methods for easy resource discovery and management are available. At the Max-Planck-Institute all aspects were tackled.
-
Methods of Language Documentation in the DOBES project
-
The DOBES program for the documentation of endangered languages, started in September 2000, has just completed its pilot phase. Eight documentation teams and one archiving team worked out agreements on formats, tools, naming conventions, and encoding, especially the linguistic level of encoding. These standards will form the basis for a five-year main phase, which will include about 20 teams. In the pilot phase, strategies to set up an online archive incorporating redundancy and regular backup were developed and implemented. Ethical and legal aspects of the archiving process were discussed and amounted to a number of documents to which all participants have to adhere to. Tools and converters developed within the pilot phase are available to others.
-
Multimedia Annotation with Multilingual Input Methods and Search Support
-
A tool set to create complex multimedia/multimodal annotations and to exploit them is described. Due to its possibility to flexibly define tiers and associate languages/writing systems with it and to even mix characters from different writing systems it is a tool which is especially suitable for work in multilingual environments. Also the search interface supports the multilingual features allowing to search for complex patterns in the annotations.
-
Management of Language Resources using Metadata
-
Technology development allows many more researchers than before to create language resources especially with multimedia
extensions. This creates a resource management problem that exceeds the boundaries of established resource centers. Metadata
environments such as the one proposed by IMDI that offer a metadata set and also tools to operate on them have a strong potential to
help the individual researcher to carry out his resource management tasks. In addition, it allows him to easily integrate his resources
into a large distributed domain of resources. The work at the Max-Planck-Institute for Psycholinguistics to establish a large multimedia
language corpus helped to understand the needs and requirements. Due to this experience the IMDI environment has reached a state of
maturity, but still some important features have to be added.
-
Analysis of Lexical Structures from Field Linguistics and Language Engineering
-
Lexica play an important role in every linguistic discipline. We are confronted with many types of lexica. Depending on the type of lexicon and the language we are currently faced with a large variety of structures from very simple tables to complex graphs, as was indicated by a recent overview of structures found in dictionaries from field linguistics and language engineering. It is important to assess these differences and aim at the integration of lexical resources in order to improve lexicon creation, exchange and reuse. This paper describes the first step towards the integration of existing structures and standards into a flexible abstract model.
-
Analysis of Lexical Structures from Field Linguistics and Language Engineering
-
presentation
-
Metadata Tools Supporting Controlled Vocabulary Services
-
Within the ISLE Metadata Initiative (IMDI) project a user-friendly editor to enter metadata descriptions and a browser operating on the linked metadata descriptions were developed. Both tools support the usage of Controlled Vocabulary (CV) repositories by means of the specification of an URL where the formal CV definition data is available.
-
Metadata Proposals for Corpora and Lexica
-
A number of metadata proposals appear to be relevant to establish a searchable and browsable domain of language resources so that users can easily discover suitable resources on the Web. These proposals differ in their approach, in their descriptive detail, in the set of linguistic data types supported by specific elements and the supporting tools. The IMDI initiative, in particular, has worked out not only a set for (multimedia) corpora, but also for lexica. All initiatives have declared their commitment towards interoperability where Dublin Core will play a role in the near future. For the long term we foresee much effort to make the metadata sets compliant with the trends of the Semantic Web and to allow an increasing re-usage of existing sub-schemas and data categories that will probably be formulated with RDF.
-
Multimodal Annotations in Gesture and Sign Language Studies
-
For multimodal annotations an exhaustive encoding system for gestures was developed to facilitate research. The structural requirements of multimodal annotations were analyzed to develop an Abstract Corpus Model which is the basis for a powerful annotation and exploitation tool for multimedia recordings and the definition of the XML-based EUDICO Annotation Format. Finally, a metadata-based data management environment has been setup to facilitate resource discovery and especially corpus management. Bt means of an appropriate digitization policy and their online availability researchers have been able to build up a large corpus covering gesture and sign language data.
|
|