|
|
2006
Up one level
-
A Web Based General Thesaurus Browser to Support Indexing of Television and Radio Programs
-
Documentation and retrieval processes at the Netherlands Institute for Sound and Vision are organized around a common thesaurus.
To help improve the quality of these processes the thesaurus was transformed into an RDF/OWL ontology and extended on basis of
implicit information and external resources. A thesaurus browser web application was designed, implemented and tested on future
users.
-
Combining video and numeric data in the analysis of sign languages within the ELAN annotation software
-
This paper describes hardware and software that can be used for the phonetic study of sign languages. The field of
sign language phonetics is characterised, and the hardware that is currently in use is described. The paper focuses
on the software that was developed to enable the recording of finger and hand movement data, and the additions to
the ELAN annotation software that facilitate the further visualisation and analysis of the data.
-
Language Resource Archiving supporting Multimodality Research
-
At the MPI multimodal research has a long history.
An increasing amount of resources is created to test
scientific hypothesis. This requires proper methods
and technologies to manage these resources.
During the last five years mature tools1 were
developed for these purposes that guide the
resources during their whole life-cycle; ELAN can
be used to create accurate and complex
annotations; IMDI helps the user to create useful
metadata descriptions, to model the underlying
relations between the resources and to search for
suitable resources; LAMUS is used to upload and
manage large language resource repositories and
finally ANNEX and LEXUS can be used to access
multimodal resources via the web.
-
ANNEX – a web-based Framework for Exploiting Annotated Media Resources
-
Manual annotation of various media streams, time series data and also text sequences is still a very time consuming work
that has to be carried out in many areas of linguistics and beyond. Based on many theoretical discussions and practical
experiences professional tools have been deployed such as ELAN that support the researcher in his/her work. Most of
these annotation tools operate on local computers. However, since more and more language resources are stored in webaccessible
archives, researchers want to take profit from the new possibilities. ANNEX was developed to fill this gap,
since it allows web-based analysis of complex annotated media streams, i.e., the users don’t have to download resources
and don’t have to download and install programs. By simply using a normal web-browser they can start their linguistic
work. Yet, due to the architecture of the Internet, ANNEX does not offer the options to create annotations, but this
feature will come. However, users have to be aware of the fact that media streaming does not offer that high accuracy as
on local computers.
-
A Grid of Language Resource Repositories
-
The DAM-LR (Distributed Access Management for
Language Resources) project aims at virtually
integrating various European language resource
archives that allow users to navigate and operate in a
single unified domain of language resources. This type
of integration introduces Grid technology to the
humanities disciplines and forms a federation of
archives. It is the basis for establishing a research
infrastructure for language resources which will
finally enable eHumanities. Currently, the complete
architecture is designed based on a few well-known
components and some components have already been
tested. Based on the technological insights gathered
and due to discussions within the international
DELAMAN (Digital Endangered Languages and
Music Archives Network) network the ethical and
organizational basis for such a federation is defined.
-
Technologies for a Federation of Language Resource Archives
-
The DAM-LR project aims at virtually integrating various European language resource archives that allow users to navigate and
operate in a single unified domain of language resources. This type of integration introduces Grid technology to the humanities
disciplines and forms a federation of archives. It is the basis for establishing a research infrastructure for language resources which will
finally enable eHumanities. Currently, the complete architecture is designed based on a few well-known components and some
components are already tested. Based on the technological insights gathered and due to discussions within the international
DELAMAN network the ethical and organizational basis for such a federation is defined.
-
Language Archives –essential pillars for eHumanities
-
presentation
-
An API for accessing the Data Category Registry
-
Central Ontologies are increasingly important to manage interoperability between different types of language resources. This was the
reason for ISO to set up a new committee ISO TC37/SC4 taking care of language resource management issues. Central to the work of
this committee is the definition of a framework for a central registry of data categories that are important in the domain of language
resources. This paper describes an application programming interface that was designed to request services from this data category
registry. The DCR is operational and the described API has already been tested from a lexicon application.
-
From Static Corpora to Dynamic Collections
-
presentation
-
ELAN: a Professional Framework for Multimodality Research
-
Utilization of computer tools in linguistic research has gained importance with the maturation of media frameworks for the handling of
digital audio and video. The increased use of these tools in gesture, sign language and multimodal interaction studies has led to
stronger requirements on the flexibility, the efficiency and in particular the time accuracy of annotation tools. This paper describes the
efforts made to make ELAN a tool that meets these requirements, with special attention to the developments in the area of time
accuracy. In subsequent sections an overview will be given of other enhancements in the latest versions of ELAN, that make it a useful
tool in multimodality research.
-
Towards a Linguist's Workbench supporting eScience Methods
-
The domain of language resources is fragmented
in many dimensions. Institutional fragmentation is
currently being addressed by Grid projects, which will
allow access to resources across institutional boundaries.
While technical encoding and structural/format
differences constitute significant challenges,, this
paper focuses on the problem of the terminological
differences encountered when researchers access
resources from different projects and creators. We
outline two projects that employ a bottom-up
approach, and discuss potential extensions towards an
eventual Service Oriented Architecture that will bring
together all the different components required to
overcome the various fragmentation boundaries and
open the road to an eHumanities environment.
-
Foundations of Modern Language Resource Archives
-
A number of serious reasons will convince an increasing amount of researchers to store their relevant material in centers which we will
call “language resource archives”. They combine the duty of taking care of long-term preservation as well as the task to give access to
their material to different user groups. Access here is meant in the sense that an active interaction with the data will be made possible
to support the integration of new data, new versions or commentaries of all sort. Modern Language Resource Archives will have to
adhere to a number of basic principles to fulfill all requirements and they will have to be involved in federations to create joint
language resource domains making it even more simple for the researchers to access the data. This paper makes an attempt to
formulate the essential pillars language resource archives have to adhere to.
-
Metadata Profile in the ISO Data Category Registry
-
Metadata descriptions of language resources become an increasing necessity since the shear amount of language resources is
increasing rapidly and especially since we are now creating infrastuctures to access these resources via the web through integrated
domains of language resource archives. Yet, the metadata frameworks offered for the domain of language resources (IMDI and
OLAC), although mature, are not as widely accepted as necessary. The lack of confidence in the stability and persistence of the
concepts and formats introduced by these metadata sets seems to be one argument for people to not invest the time needed for
metadata creation. The introduction of these concepts into an ISO standardization process may convince contributors to make use of
the terminology. The availability of the ISO Data Category Registry that includes a metadata profile will also offer the opportunity for
researchers to construct their own metadata set tailored to the needs of the project at hand, but nevertheless supporting interoperability.
-
LAMUS – the Language Archive Management and Upload System
-
LAMUS is a web-based service that allows researchers to deposit their language resources into a language resources archive. It was
developed at the MPI for Psycholinguistics for stricter control of the archive coherence and consistency and allowing wider use of the
archiving facilities without increasing the workload for archive and corpus managers. LAMUS is based on the use of IMDI metadata
standard for language resources and offers metadata search and browsing over the archive.
-
LEXUS, a web-based tool for manipulating lexical resources
-
LEXUS provides a flexible framework for the maintaining lexical structure and content. It is the first implementation of the Lexical
Markup Framework model currently being developed at ISO TC37/SC4. Amongst its capabilities are the possibility to create lexicon
structures, manipulate content and use of typed relations. Integration of well established Data Category Registries is supported to
further promote interoperability by allowing access to well established linguistic concepts. Advanced linguistic functionality is offered
to assist users in cross lexica operations such as search and comparison and merging of lexica. To enable use within various user
groups the look and feel of each lexicon may be customized. In the near future more functionality will be added including integration
with other tools accessing lexical content.
-
Language Archive Utilization
-
presentation
-
Comparison of Resource Discovery Methods
-
It is an ongoing debate whether categorical systems created by some experts are an appropriate way to help users finding useful
resources in the internet. However for the much more restricted domain of language documentation such a category system might still
prove reasonable if not indispensable. This article gives an overview over the particular IMDI category set and presents a rough
evaluation of it's practical use at the Max-Planck-Institute Nijmegen.
-
Perspectives for Ontologies in Linguistics -introduction to the LRECPanel-
-
presentation
-
Language Archives at MPI
-
presentation
-
Integrated Services for the Language Resource Domain
-
Integrated services for the Language Resource domain will enable users to operate in a single unified domain of language resources. This type
of integration introduces Grid technology to the humanities disciplines and allows the formation of a federation of archives. The DAM-LR
project, will establish such a federation, integrating various European language resource archives. The complete architecture is designed based
on a few well-known components and some integrated services are already tested and available.
-
Ontology-based Language Archive Utilization
-
At the MPI for Psycholinguistics a large archive with language resources has been created with contributions from many different
individual researchers and research projects. All of these resources, in particular annotated media streams and multimedia lexica, are
accessible via the web and can be utilized with the help of web-based utilization frameworks. Therefore, the archive lends itself to
motivate users to operate across the boundaries of single corpora and to support cross-language work. This, however, can only be done
when the problems of interoperability, in particular at the level of linguistic encoding, can be solved in an efficient way. Two Max-
Planck-Institutes are cooperating to build a framework that allows users to easily create their own practical ontologies and if wanted to
relate their concepts to central ontologies.
-
Data Gloves in Sign Language Research
-
presentation
-
Combining video and numeric data in the analysis of sign languages within the ELAN annotation software
-
Poster
|
|