4.2.18. Importing a document from Shoebox
4.2.18. Importing a document from Shoebox[1]
ELAN supports the import of documents from Shoebox, thereby allowing you to link transcribed and/or interlinearized documents to the time axis of media files. In order to import from Shoebox, you need at least the following two files:
the Shoebox file (
*.txt);the media file(s) (
*.mpg,*.mov,*.wavetc.);
Optionally you can use the corresponding Shoebox
database type file (*.typ). If this is not
available, one has to provide a list with field markers (= tier
names).
![]() | Note |
|---|---|
If you do not know the Shoebox database type file, do the following:
|
Importing Shoebox files with a TYP file
To import a Shoebox file into ELAN, do the following:
Click on . The dialog box appears.
Specify the name and directory of the two files, e.g.:
Like
*.eafdocuments, the Shoebox file and the media file(s) do not necessarily need to have the same name, and they do not need to be in the same directory (see Section 4.1).If the Shoebox file contains both aligned (i.e. containing time information) and non-aligned records, the aligned ones will maintain the timing, whereas the location of the non-aligned records will be interpolated automatically.
Click to import the file; otherwise click to exit the dialog box without importing the file.
An ELAN window containing the imported Shoebox file appears.
Importing Shoebox files without a TYP file
Instead of using a Shoebox *.typ file,
there is also an option in ELAN to define the field markers yourself
when importing a Shoebox file.
select the Set field markers and click on the button in the import dialog:
Now fill in a field marker as used in the Shoebox
*.txtfileOptionally select a parent marker (see Section 5.1)
Optionally select a stereotype (symbolic subdivision or association, see Section 5.1)
Choose a character set (Latin-1, SIL IPA or UTF-8) for the tier
Click on Add.
Repeat step 2-6 for all field markers.
If the selected marker designates a participant, check the checkbox. If you don’t want the selected marker to be imported, tick .
finally choose and click on in the import Shoebox file dialog
Loading and storing Markers
Once you have manually created a set of field makers, you might want to reuse them later on. ELAN provides support for this:
To save a set of field markers, select the button. This will display a save dialog. Enter a filename, and press save.
The same way you can open a stored field marker set by clicking on
Connecting the transcription to a media file
Once the import has succeeded, you can add a reference to a media file via the menu, as described in Section 4.2.11. If the imported Shoebox file was exported from ELAN before, you won’t need to establish the link to the media file(s) again, as in that case the location information is stored in the file.
About the import process
ELAN imports Shoebox files according to the following conventions:
The Shoebox field markers are imported as ELAN tiers. The tier label is identical to that of the field marker, except for the added extension @‘Speaker-ID’.
This addition is necessary because ELAN and Shoebox differ in how they code information about multiple speakers:
In ELAN, each speaker is coded on a separate tier.
In Shoebox, all speakers are coded using the same field, and their identity is specified in a separate field.
When importing texts by multiple speakers, ELAN splits each Shoebox field into several ELAN tiers (one for each speaker) and adds the speaker-ID to the tier label.
If speaker information is not specified in the Shoebox file, the extension @unknown is added.
The following screenshot illustrates how ELAN treats texts by multiple speakers:
Note that ELAN can only read speaker information if:
A marker is defined as a Participant marker in the Set field marker dialog (see Importing Shoebox files without a TYP file above), or if:
It is coded in a Shoebox field labeled \EUDICOp or \ELANParticipant (see illustration above). If this field is not present, or if speaker information is coded in a different field, ELAN will assume that there is only one speaker. I.e., if you have multiple speakers and if you want ELAN to assign them to separate tiers, do the following:
For every Shoebox record, add the field marker \EUDICOp.
For every Shoebox record, enter the relevant speaker-ID into this field.
![]() | Note |
|---|---|
When the file is exported back to Shoebox (see Section 4.2.23), the extension @‘Speaker-ID’ is automatically dropped from the field marker, and the Shoebox records are sorted according to their record marker (e.g., in the above illustration, “test 001” is sorted before “test 002” etc.) |
Based on the information contained in the Shoebox database type file, the tiers are brought into a hierarchical relationship and are assigned to linguistic types (see Section 5.1 for details of tier hierarchies and linguistic types). For every tier name a corresponding linguistic type with the same name is created. All of these linguistic types are connected with a stereotype in such a way that it fits with the original Shoebox structure.
The Shoebox record marker is assigned to the stereotype None, i.e., it is an independent, time-alignable parent tier.
The transcription and parsing fields of Shoebox are assigned to the stereotype Symbolic Subdivision, i.e., they are referring tiers that can be subdivided into smaller units.
All other fields are assigned to the stereotype Symbolic Association, i.e., they are referring tiers that cannot be subdivided into smaller units.
If you define the markers yourself, then there also is the possibility to choose the Time Subdivision stereotype. For example:
All SIL IPA characters are converted into Unicode characters during import. If you export the file back into Shoebox (see Section 4.2.23), the Unicode characters will be converted back into SIL IPA characters.
Initially, unless it had the time code information, the imported Shoebox file does not contain information about timing. Instead, ELAN automatically assigns each Shoebox record to a three second time interval, as in the following illustration:
The time alignment has to be done manually for each Shoebox record. Do the following:
Activate the Bulldozer mode: Click on (see Section 5.6.8 for the three available modes).
![[Note]](images/note.png)
Note If you do not activate the Bulldozer mode, you will inadvertently overwrite and thereby delete existing annotations. Make sure that is enabled in the menu.
Click on the first annotation on the parent tier (i.e., the first Shoebox record). It appears in a dark blue frame.
Modify the boundaries of that annotation, so that they are aligned with the correct time interval (see Section 5.6.6 for ways of modifying boundaries).
Press CTRL+ENTER to apply the new time interval.
The parent annotation (together with all its referring annotations) is assigned to the new time interval. All other parent annotations are moved to the right.
Repeat steps 2 to 4 for each parent annotation.
The following screenshot illustrates steps 1 to 4:
After you have done the time-alignment, you can export the file back to Shoebox – in this case, the time code information will be kept (see Section 4.2.23). If you then re-import the file back into ELAN, ELAN automatically assigns the Shoebox records to their correct time intervals.
An imported Shoebox file can be saved as an ELAN file (see Section 4.2.5), exported back into Shoebox (see Section 4.2.23), or exported as a tab-delimited text (see Section 4.2.25).
[1] From here on, every appearance of Shoebox can also be read as Toolbox, i.e. the newer version of what was formerly known as Shoebox.









