Skip to content

Language Archiving Technology

Sections
Personal tools
You are here: Home » Papers » 2002

2002

Up one level
2002 LREP: A Language Repository Exchange Protocol
The recent increase in the number and complexity of the language resources available on the Internet is followed by a similar increase of available tools for linguistic analysis. Ideally the user does not need to be confronted with the question in how to match tools with resources. If resource repositories and tool repositories offer adequate metadata information and a suitable exchange protocol is developed this matching process could be performed (semi-) automatically.
2002 Metadata Set and Tools for Multimedia/Multimodal Language Resources
Within the ISLE Project about International Standards for Language Engineering the IMDI Metadata Initiative developed a complete environment for creating, maintaining and using metadata descriptions for multimedia/multimodal language resources. This environment includes a proposal for a suitable metadata set, tools to create, browse and search in IMDI metadata domains and suggestions about how to organize centers acting as metadata repositories. By using the IMDI approach a formulation in RDF is intended which enable the IMDI set to be integrated in Semantic Web activities.
2002 Cross-Linguistic Studies of Multimodal Communication
Gestures are culture specific forms of arm movements which are used in communication to transfer information to the listener, to guide the planning of the speech production process and to disambiguate the incoming speech. To understand the underlying mechanisms gestures have to be analyzed in cross-linguistic processes. Large projects are necessary covering speakers from various cultural background and many recordings. Such projects can only be successfully carried out, when suitable gesture encoding schemes, generic annotation schemes, powerful tools supporting the schemes and efficient methods for easy resource discovery and management are available. At the Max-Planck-Institute all aspects were tackled.
2002 Methods of Language Documentation in the DOBES project
The DOBES program for the documentation of endangered languages, started in September 2000, has just completed its pilot phase. Eight documentation teams and one archiving team worked out agreements on formats, tools, naming conventions, and encoding, especially the linguistic level of encoding. These standards will form the basis for a five-year main phase, which will include about 20 teams. In the pilot phase, strategies to set up an online archive incorporating redundancy and regular backup were developed and implemented. Ethical and legal aspects of the archiving process were discussed and amounted to a number of documents to which all participants have to adhere to. Tools and converters developed within the pilot phase are available to others.
2002 Multimedia Annotation with Multilingual Input Methods and Search Support
A tool set to create complex multimedia/multimodal annotations and to exploit them is described. Due to its possibility to flexibly define tiers and associate languages/writing systems with it and to even mix characters from different writing systems it is a tool which is especially suitable for work in multilingual environments. Also the search interface supports the multilingual features allowing to search for complex patterns in the annotations.
2002 Management of Language Resources using Metadata
Technology development allows many more researchers than before to create language resources especially with multimedia extensions. This creates a resource management problem that exceeds the boundaries of established resource centers. Metadata environments such as the one proposed by IMDI that offer a metadata set and also tools to operate on them have a strong potential to help the individual researcher to carry out his resource management tasks. In addition, it allows him to easily integrate his resources into a large distributed domain of resources. The work at the Max-Planck-Institute for Psycholinguistics to establish a large multimedia language corpus helped to understand the needs and requirements. Due to this experience the IMDI environment has reached a state of maturity, but still some important features have to be added.
2002 Analysis of Lexical Structures from Field Linguistics and Language Engineering
Lexica play an important role in every linguistic discipline. We are confronted with many types of lexica. Depending on the type of lexicon and the language we are currently faced with a large variety of structures from very simple tables to complex graphs, as was indicated by a recent overview of structures found in dictionaries from field linguistics and language engineering. It is important to assess these differences and aim at the integration of lexical resources in order to improve lexicon creation, exchange and reuse. This paper describes the first step towards the integration of existing structures and standards into a flexible abstract model.
2002 Analysis of Lexical Structures from Field Linguistics and Language Engineering
presentation
2002 Metadata Tools Supporting Controlled Vocabulary Services
Within the ISLE Metadata Initiative (IMDI) project a user-friendly editor to enter metadata descriptions and a browser operating on the linked metadata descriptions were developed. Both tools support the usage of Controlled Vocabulary (CV) repositories by means of the specification of an URL where the formal CV definition data is available.
2002 Metadata Proposals for Corpora and Lexica
A number of metadata proposals appear to be relevant to establish a searchable and browsable domain of language resources so that users can easily discover suitable resources on the Web. These proposals differ in their approach, in their descriptive detail, in the set of linguistic data types supported by specific elements and the supporting tools. The IMDI initiative, in particular, has worked out not only a set for (multimedia) corpora, but also for lexica. All initiatives have declared their commitment towards interoperability where Dublin Core will play a role in the near future. For the long term we foresee much effort to make the metadata sets compliant with the trends of the Semantic Web and to allow an increasing re-usage of existing sub-schemas and data categories that will probably be formulated with RDF.
2002 Multimodal Annotations in Gesture and Sign Language Studies
For multimodal annotations an exhaustive encoding system for gestures was developed to facilitate research. The structural requirements of multimodal annotations were analyzed to develop an Abstract Corpus Model which is the basis for a powerful annotation and exploitation tool for multimedia recordings and the definition of the XML-based EUDICO Annotation Format. Finally, a metadata-based data management environment has been setup to facilitate resource discovery and especially corpus management. Bt means of an appropriate digitization policy and their online availability researchers have been able to build up a large corpus covering gesture and sign language data.
 

Powered by Plone

This site conforms to the following standards: