Skip to content

Language Archiving Technology

Sections
Personal tools
You are here: Home » Tools » Lexus » toolbox_to_lexus » 3.1. Data file inconsistent with the hierarchy of the structure file

3.1. Data file inconsistent with the hierarchy of the structure file

Document Actions

3.1. Data file inconsistent with the hierarchy of the structure file

The types of problems that might spring out of your data file are not many, but the very size of a typical data file, numbering hundreds of entries, makes it very important to think in advance about the potential discrepancies that might arise when importing your data into LEXUS. There is however a very simple solution to all of them, an idea that should in any case underlie any lexical enterprise: consistency.

In the previous section we have stressed the importance of having a clear idea of how you want to organize your markers. Once that has been accomplished it is crucial to stick to that order in your data file. Having said that, it is clear that often such order appears in your data as time goes by, and more and more information about the lexicon is available. For Toolbox it does not make much difference, whether you stick to the hierarchy, as the markers you want are simply listed for every entry separately.

For LEXUS, however, it can be of great importance. The problems begin when any of your entries contains a string of markers and their values that is against the hierarchy defined in the .typ file. Let us come back to the xv example. In Figure 3.1 the structure of the .typ file is presented in the black box and the relevant part of the entry follows. As we can see they both follow the same pattern:

Hierarchy of markers and an entry that follows this pattern

Figure 3.1. Hierarchy of markers and an entry that follows this pattern


That is the translations and the sound file are placed under the appropriate example in the vernacular. If all the entries follow this order there will not be any problems. Notice that the order of xn, xe and sfx does not matter - this is because they are all defined under the xv in the structure, and as long as they follow the xv in the data file, their ordering is of no relevance. However, let us say that there is an entry in your data file in which you have typed the markers and their values in a different way. In Figure 3.2 again the structure of the .typ file is shown in the box and the relevant part of the entry:

Hierarchy of markers and an entry that does not follow the pattern

Figure 3.2. Hierarchy of markers and an entry that does not follow the pattern


On the face of it, it seems not to matter that much. After all it is still clear that the sound file refers to that particular example. However, you have to be aware how LEXUS will treat such an entry. LEXUS reads the entries linearly, line after line, and fills the structure that the .typ file has provided with information from the data file. Whenever it encounters a marker that has a certain value, it checks under which marker this marker was defined in the .typ file. Then it looks back through the part of the entry that has already been created in LEXUS to see whether this higher marker has already appeared in the structure or not. If it has then the currently analyzed marker will be simply linked under it. Let us assume that the .typ file and the data file follow the same structure and that xv in our structure file is linked under rf, reference group. Let us also assume that rf has already appeared and LEXUS has created a node for it.

If the situation we are dealing with is such as shown in Figure 3.1 the steps LEXUS will take are the following. When encountering xv in the data file it will check in the structure file where it should be linked – the answer is under rf. As rf already exists in the structure of this entry, xv will be linked somewhere under rf. Let us remind also that xv has other nodes linked under it in the structure file. Therefore, a group node will be created out of it first (xv group) and linked under rf and xv itself will be linked under that group node. The next marker that LEXUS will encounter will be xe. Again LEXUS will check if the marker that in the hierarchy is above it (xv group) already exists in the structure. It does, as it has just been created. In that case xe will be linked under it. This operation will be repeated until all the relevant markers will be linked under xv group. As a result, the following structure will be created in LEXUS for that entry:

LEXUS structure for the entry that follows the structure of the .typ file

Figure 3.3. LEXUS structure for the entry that follows the structure of the .typ file


This is what we would like to have in our lexicon. However, if the order of the markers in a particular entry is not as in the .typ file, that is we are dealing with the situation presented in Figure 3.2, the result will be different. LEXUS will first encounter sfx and not xv. Then again it will check under which node sfx is defined in the structure file. We already know that it is defined under xv group. However, xv group has not appeared in this entry yet, as xv is placed later, after sfx. LEXUS therefore will create an xv group, and will link sfx under that group node. After that it will encounter xv and thus create another xv group with xv, xn and xe linked under it. In the end the structure that LEXUS will create will be the following:

LEXUS structure for an entry that does not follow structure of the .typ file

Figure 3.4. LEXUS structure for an entry that does not follow structure of the .typ file


This can be very problematic, as the information that this sound file (sfx) goes with these translations (xv, xe, xn) is now lost – it is distributed between two different xv groups: one missing a sound file, the other missing the translation information. Such situations will happen always if a marker that is higher in the hierarchy in the structure file, appears in the data file lower than the markers that are defined under it. If you want to import your data into LEXUS, it has to be made sure that such situations do not appear in your data file. As a practical guideline we suggest therefore to stick to the hierarchy of markers from your .typ file in your data file. This means placing a maker right under the marker under which it was defined, and never above it. If, however, certain entries do not conform to the structure of the .typ file, they have to be changed manually. This means simply changing the order of the markers for that particular entry in the data file.

Importantly, there is one partial exception to the rule by which LEXUS reads the data file. Of course not all markers will be used in case of every entry and it would be a waste of time to list all the unnecessary markers just for the sake of keeping to the hierarchy. But LEXUS reads only the markers that are used in a particular entry, and makes a structure out of them according to the .typ file.

Secondly, what LEXUS does not see, it does not include. That is to say, it is not enough for a marker to appear in an entry for LEXUS to read it. It also has to have a value to be recognised by LEXUS. Whenever you have a marker that has no value this marker will be simply omitted.

This can work to your advantage: LEXUS will always create a coherent minimal structure out of the lexical entry according to the .typ pattern. Therefore an empty marker can never lead to the afore-mentioned complications. There remains the question however, why to keep empty markers.

Created by latadmin
Last modified 2009-12-10 14:41
 

Powered by Plone

This site conforms to the following standards: