How to import Audacity timecodes?
Up one level
How to import Audacity timecodes?
Hi,
I have tokenized recordings using Audacity. Afterwards I would like to import timecodes including annotations into ELAN. I've used the import function in ELAN but there is an issue with timecodes.
After reading on the forum and ELAN manuals, I understood that ELAN requires timecodes in the following format:
HH:MM:SS.XXX
(http://www.lat-mpi.eu/tools/elan/manual/ch04s02s21.html#Fig_Tab-delimited_Text)
But Audacity timecodes are in the following format:
577,921150 578,889293 600W001
581,996355 583,182143 600W001
591,841705 592,861965 600W002
608,417645 609,186906 600W003
619,168532 620,331804 600W004
649,541195 650,787022 600W005
I take these to be seconds and fractions. So either I need to transformat the timecodes or need to modify the export of Audacity or ELAN needs extended file filters.
Any help would be strongly appreciated.
Thanks,
Meikal
Re: How to import Audacity timecodes?
Hi Meikal! If I understand correctly, Audacity uses "seconds,microseconds TAB seconds,microseconds TAB content" syntax, where start time is equal to end time for point-in-time annotations and different for timespan annotations. The "hours:minutes:seconds.milliseconds" style of ELAN also works with "seconds.fraction" - any precision is accepted and internally converted to milliseconds. You can search and replace all "," into "." and then use CSV import.
Another option is using the Avatech recognizer tier import, which allows both "seconds.fraction" and integer milliseconds for timestamps, so you can use:
1. replace "," to "." and TAB to ";"
2. add a header line saying "#start;#end;content"
3. import the file as Avatech tier CSV file
The menu item for this is called "import tiers (lines) from recognizer" or similar in ELAN 4.0 and supports both XML and CSV formats. While you have to add a header line for this method, it takes you fewer clicks than using the generic CSV import of ELAN where you have to specify manually which column contains which field.
Re: Re: How to import Audacity timecodes?
Hi Eric,
thanks for your reply.
Using the "import tiers (lines) from recognizer" method, import did work now, after making the changes and changing the file type from .txt to .csv. Using the TXT import didn't work.
But it seems that text encoding is not being correctly recognized. I have some transcriptions in Unicode, which appear all garbled. Notepad++ identifies files as UTF-8 w/o BOM. I'm pretty sure encoding is correct. How could I fix that?
Also, for most transcriptions I'm using my own identifier, such as 600WXXX, e.g. 600W123, which is again referenced in my Toolbox file. Do you see any way to crossreference data - i.e. import Audacity time codes, import Toolbox DB, associate Toolbox Marker with tier, automatically replace ID / associate ID with Marker content?
Thanks,
Meikal
Re: Re: Re: How to import Audacity timecodes?
In the currently available ELAN version the import module for recognizer csv files assumes a (system) default encoding. In the next version UTF-8 will be assumed. The current Import->CSV/Tab-delimited text function already assumes UTF-8 as the encoding. For the time being you can try that?
Cross referencing anything while importing a Toolbox file is not supported. What could be tried is to import the audacity timecodes into ELAN, export this to Toolbox (which will create lines like \ELANbegin and \ELANend for each \600Wxxx element) and then merge the result with a copy of the actual Toolbox file somehow (with a script or an advanced "find and replace")?
-Han
Re: Re: Re: Re: How to import Audacity timecodes?
In the currently available ELAN version the import module for recognizer csv files assumes a (system) default encoding. In the next version UTF-8 will be assumed. The current Import->CSV/Tab-delimited text function already assumes UTF-8 as the encoding. For the time being you can try that? Cross referencing anything while importing a Toolbox file is not supported. What could be tried is to import the audacity timecodes into ELAN, export this to Toolbox (which will create lines like \ELANbegin and \ELANend for each \600Wxxx element) and then merge the result with a copy of the actual Toolbox file somehow (with a script or an advanced "find and replace")? -Han