Skip to content

Language Archiving Technology

Sections
Personal tools
You are here: Home » Tools » Elan » ELAN Forum » How to import Audacity timecodes?

How to import Audacity timecodes?

Up one level

How to import Audacity timecodes?

Posted by meikal at 2011-02-09 13:23  

Hi,

I have tokenized recordings using Audacity. Afterwards I would like to import timecodes including annotations into ELAN. I've used the import function in ELAN but there is an issue with timecodes.

After reading on the forum and ELAN manuals, I understood that ELAN requires timecodes in the following format:

HH:MM:SS.XXX

(http://www.lat-mpi.eu/tools/elan/manual/ch04s02s21.html#Fig_Tab-delimited_Text)

But Audacity timecodes are in the following format:

577,921150 578,889293 600W001
581,996355 583,182143 600W001
591,841705 592,861965 600W002
608,417645 609,186906 600W003
619,168532 620,331804 600W004
649,541195 650,787022 600W005

I take these to be seconds and fractions. So either I need to transformat the timecodes or need to modify the export of Audacity or ELAN needs extended file filters.

Any help would be strongly appreciated.

Thanks,

Meikal

Re: How to import Audacity timecodes?

Posted by eric at 2011-02-09 14:04  

Hi Meikal! If I understand correctly, Audacity uses "seconds,microseconds TAB seconds,microseconds TAB content" syntax, where start time is equal to end time for point-in-time annotations and different for timespan annotations. The "hours:minutes:seconds.milliseconds" style of ELAN also works with "seconds.fraction" - any precision is accepted and internally converted to milliseconds. You can search and replace all "," into "." and then use CSV import.

Another option is using the Avatech recognizer tier import, which allows both "seconds.fraction" and integer milliseconds for timestamps, so you can use:

1. replace "," to "." and TAB to ";"
2. add a header line saying "#start;#end;content"
3. import the file as Avatech tier CSV file

The menu item for this is called "import tiers (lines) from recognizer" or similar in ELAN 4.0 and supports both XML and CSV formats. While you have to add a header line for this method, it takes you fewer clicks than using the generic CSV import of ELAN where you have to specify manually which column contains which field.

Re: Re: How to import Audacity timecodes?

Posted by meikal at 2011-02-09 15:42  

Hi Eric,

thanks for your reply.

Using the "import tiers (lines) from recognizer" method, import did work now, after making the changes and changing the file type from .txt to .csv. Using the TXT import didn't work.

But it seems that text encoding is not being correctly recognized. I have some transcriptions in Unicode, which appear all garbled. Notepad++ identifies files as UTF-8 w/o BOM. I'm pretty sure encoding is correct. How could I fix that?

Also, for most transcriptions I'm using my own identifier, such as 600WXXX, e.g. 600W123, which is again referenced in my Toolbox file. Do you see any way to crossreference data - i.e. import Audacity time codes, import Toolbox DB, associate Toolbox Marker with tier, automatically replace ID / associate ID with Marker content?

Thanks,

Meikal

Re: Re: Re: How to import Audacity timecodes?

Posted by hasloe at 2011-02-14 15:32  

In the currently available ELAN version the import module for recognizer csv files assumes a (system) default encoding. In the next version UTF-8 will be assumed. The current Import->CSV/Tab-delimited text function already assumes UTF-8 as the encoding. For the time being you can try that?
Cross referencing anything while importing a Toolbox file is not supported. What could be tried is to import the audacity timecodes into ELAN, export this to Toolbox (which will create lines like \ELANbegin and \ELANend for each \600Wxxx element) and then merge the result with a copy of the actual Toolbox file somehow (with a script or an advanced "find and replace")?

-Han

Re: Re: Re: Re: How to import Audacity timecodes?

Posted by Rsinger at 2011-06-16 05:40  
Hi, I am in a similar situation. I have aiff marker numbers which correspond to line numbers in my Toolbox texts and I am now in the process of converting this system to elan. Through the use of regular expressions in the free program TextWrangler I am able to add the empty fields \ELANBegin and \ELANEnd to my Toolbox transcription, after the \ref field. I then copy and paste the timecodes into these fields from the aiff markers. Then using regular expressions in TextWrangler I am able to convert my timecodes which are in the format m'ss.msec into one of the two formats required by Elan when importing Toolbox texts. I use the hh:mm:ss.msec format. So far it seems to be working, importing this file into Elan using the Toolbox import option and choosing "extract time from record marker". The only problem is that for some lines I only added an aiff marker for the start of the line not the end! Luckily Elan does not care if the ELANBegin or ELANEnd fields are empty (it just applies the 5000msec block duration) - but unfortunately for me I still need to do some manual time aligning once it is imported. Would be happy to have any more suggestions about how to improve the process. Thanks to Han Sloetjes, Paul Trilsbeek, Tom Honeyman and Nick Thieberger for their suggestions over the years (I've been putting this off for a while as even with this process there is still a bit of work to do at the end). Cheers, Ruth
hasloe-2011-02-14 15:32:25
In the currently available ELAN version the import module for recognizer csv files assumes a (system) default encoding. In the next version UTF-8 will be assumed. The current Import->CSV/Tab-delimited text function already assumes UTF-8 as the encoding. For the time being you can try that? Cross referencing anything while importing a Toolbox file is not supported. What could be tried is to import the audacity timecodes into ELAN, export this to Toolbox (which will create lines like \ELANbegin and \ELANend for each \600Wxxx element) and then merge the result with a copy of the actual Toolbox file somehow (with a script or an advanced "find and replace")? -Han
 

Powered by Plone

This site conforms to the following standards: