Skip to content

Language Archiving Technology

Sections
Personal tools
You are here: Home » Papers » 2006

2006

Up one level
2006 A Web Based General Thesaurus Browser to Support Indexing of Television and Radio Programs
Documentation and retrieval processes at the Netherlands Institute for Sound and Vision are organized around a common thesaurus. To help improve the quality of these processes the thesaurus was transformed into an RDF/OWL ontology and extended on basis of implicit information and external resources. A thesaurus browser web application was designed, implemented and tested on future users.
2006 Combining video and numeric data in the analysis of sign languages within the ELAN annotation software
This paper describes hardware and software that can be used for the phonetic study of sign languages. The field of sign language phonetics is characterised, and the hardware that is currently in use is described. The paper focuses on the software that was developed to enable the recording of finger and hand movement data, and the additions to the ELAN annotation software that facilitate the further visualisation and analysis of the data.
2006 Language Resource Archiving supporting Multimodality Research
At the MPI multimodal research has a long history. An increasing amount of resources is created to test scientific hypothesis. This requires proper methods and technologies to manage these resources. During the last five years mature tools1 were developed for these purposes that guide the resources during their whole life-cycle; ELAN can be used to create accurate and complex annotations; IMDI helps the user to create useful metadata descriptions, to model the underlying relations between the resources and to search for suitable resources; LAMUS is used to upload and manage large language resource repositories and finally ANNEX and LEXUS can be used to access multimodal resources via the web.
2006 ANNEX – a web-based Framework for Exploiting Annotated Media Resources
Manual annotation of various media streams, time series data and also text sequences is still a very time consuming work that has to be carried out in many areas of linguistics and beyond. Based on many theoretical discussions and practical experiences professional tools have been deployed such as ELAN that support the researcher in his/her work. Most of these annotation tools operate on local computers. However, since more and more language resources are stored in webaccessible archives, researchers want to take profit from the new possibilities. ANNEX was developed to fill this gap, since it allows web-based analysis of complex annotated media streams, i.e., the users don’t have to download resources and don’t have to download and install programs. By simply using a normal web-browser they can start their linguistic work. Yet, due to the architecture of the Internet, ANNEX does not offer the options to create annotations, but this feature will come. However, users have to be aware of the fact that media streaming does not offer that high accuracy as on local computers.
2006 A Grid of Language Resource Repositories
The DAM-LR (Distributed Access Management for Language Resources) project aims at virtually integrating various European language resource archives that allow users to navigate and operate in a single unified domain of language resources. This type of integration introduces Grid technology to the humanities disciplines and forms a federation of archives. It is the basis for establishing a research infrastructure for language resources which will finally enable eHumanities. Currently, the complete architecture is designed based on a few well-known components and some components have already been tested. Based on the technological insights gathered and due to discussions within the international DELAMAN (Digital Endangered Languages and Music Archives Network) network the ethical and organizational basis for such a federation is defined.
2006 Technologies for a Federation of Language Resource Archives
The DAM-LR project aims at virtually integrating various European language resource archives that allow users to navigate and operate in a single unified domain of language resources. This type of integration introduces Grid technology to the humanities disciplines and forms a federation of archives. It is the basis for establishing a research infrastructure for language resources which will finally enable eHumanities. Currently, the complete architecture is designed based on a few well-known components and some components are already tested. Based on the technological insights gathered and due to discussions within the international DELAMAN network the ethical and organizational basis for such a federation is defined.
2006 Language Archives –essential pillars for eHumanities
presentation
2006 An API for accessing the Data Category Registry
Central Ontologies are increasingly important to manage interoperability between different types of language resources. This was the reason for ISO to set up a new committee ISO TC37/SC4 taking care of language resource management issues. Central to the work of this committee is the definition of a framework for a central registry of data categories that are important in the domain of language resources. This paper describes an application programming interface that was designed to request services from this data category registry. The DCR is operational and the described API has already been tested from a lexicon application.
2006 From Static Corpora to Dynamic Collections
presentation
2006 ELAN: a Professional Framework for Multimodality Research
Utilization of computer tools in linguistic research has gained importance with the maturation of media frameworks for the handling of digital audio and video. The increased use of these tools in gesture, sign language and multimodal interaction studies has led to stronger requirements on the flexibility, the efficiency and in particular the time accuracy of annotation tools. This paper describes the efforts made to make ELAN a tool that meets these requirements, with special attention to the developments in the area of time accuracy. In subsequent sections an overview will be given of other enhancements in the latest versions of ELAN, that make it a useful tool in multimodality research.
2006 Towards a Linguist's Workbench supporting eScience Methods
The domain of language resources is fragmented in many dimensions. Institutional fragmentation is currently being addressed by Grid projects, which will allow access to resources across institutional boundaries. While technical encoding and structural/format differences constitute significant challenges,, this paper focuses on the problem of the terminological differences encountered when researchers access resources from different projects and creators. We outline two projects that employ a bottom-up approach, and discuss potential extensions towards an eventual Service Oriented Architecture that will bring together all the different components required to overcome the various fragmentation boundaries and open the road to an eHumanities environment.
2006 Foundations of Modern Language Resource Archives
A number of serious reasons will convince an increasing amount of researchers to store their relevant material in centers which we will call “language resource archives”. They combine the duty of taking care of long-term preservation as well as the task to give access to their material to different user groups. Access here is meant in the sense that an active interaction with the data will be made possible to support the integration of new data, new versions or commentaries of all sort. Modern Language Resource Archives will have to adhere to a number of basic principles to fulfill all requirements and they will have to be involved in federations to create joint language resource domains making it even more simple for the researchers to access the data. This paper makes an attempt to formulate the essential pillars language resource archives have to adhere to.
2006 Metadata Profile in the ISO Data Category Registry
Metadata descriptions of language resources become an increasing necessity since the shear amount of language resources is increasing rapidly and especially since we are now creating infrastuctures to access these resources via the web through integrated domains of language resource archives. Yet, the metadata frameworks offered for the domain of language resources (IMDI and OLAC), although mature, are not as widely accepted as necessary. The lack of confidence in the stability and persistence of the concepts and formats introduced by these metadata sets seems to be one argument for people to not invest the time needed for metadata creation. The introduction of these concepts into an ISO standardization process may convince contributors to make use of the terminology. The availability of the ISO Data Category Registry that includes a metadata profile will also offer the opportunity for researchers to construct their own metadata set tailored to the needs of the project at hand, but nevertheless supporting interoperability.
2006 LAMUS – the Language Archive Management and Upload System
LAMUS is a web-based service that allows researchers to deposit their language resources into a language resources archive. It was developed at the MPI for Psycholinguistics for stricter control of the archive coherence and consistency and allowing wider use of the archiving facilities without increasing the workload for archive and corpus managers. LAMUS is based on the use of IMDI metadata standard for language resources and offers metadata search and browsing over the archive.
2006 LEXUS, a web-based tool for manipulating lexical resources
LEXUS provides a flexible framework for the maintaining lexical structure and content. It is the first implementation of the Lexical Markup Framework model currently being developed at ISO TC37/SC4. Amongst its capabilities are the possibility to create lexicon structures, manipulate content and use of typed relations. Integration of well established Data Category Registries is supported to further promote interoperability by allowing access to well established linguistic concepts. Advanced linguistic functionality is offered to assist users in cross lexica operations such as search and comparison and merging of lexica. To enable use within various user groups the look and feel of each lexicon may be customized. In the near future more functionality will be added including integration with other tools accessing lexical content.
2006 Language Archive Utilization
presentation
2006 Comparison of Resource Discovery Methods
It is an ongoing debate whether categorical systems created by some experts are an appropriate way to help users finding useful resources in the internet. However for the much more restricted domain of language documentation such a category system might still prove reasonable if not indispensable. This article gives an overview over the particular IMDI category set and presents a rough evaluation of it's practical use at the Max-Planck-Institute Nijmegen.
2006 Perspectives for Ontologies in Linguistics -introduction to the LRECPanel-
presentation
2006 Language Archives at MPI
presentation
2006 Integrated Services for the Language Resource Domain
Integrated services for the Language Resource domain will enable users to operate in a single unified domain of language resources. This type of integration introduces Grid technology to the humanities disciplines and allows the formation of a federation of archives. The DAM-LR project, will establish such a federation, integrating various European language resource archives. The complete architecture is designed based on a few well-known components and some integrated services are already tested and available.
2006 Ontology-based Language Archive Utilization
At the MPI for Psycholinguistics a large archive with language resources has been created with contributions from many different individual researchers and research projects. All of these resources, in particular annotated media streams and multimedia lexica, are accessible via the web and can be utilized with the help of web-based utilization frameworks. Therefore, the archive lends itself to motivate users to operate across the boundaries of single corpora and to support cross-language work. This, however, can only be done when the problems of interoperability, in particular at the level of linguistic encoding, can be solved in an efficient way. Two Max- Planck-Institutes are cooperating to build a framework that allows users to easily create their own practical ontologies and if wanted to relate their concepts to central ontologies.
2006 Data Gloves in Sign Language Research
presentation
2006 Combining video and numeric data in the analysis of sign languages within the ELAN annotation software
Poster
 

Powered by Plone

This site conforms to the following standards: