3.1. Data file inconsistent with the hierarchy of the structure file
The types of problems that might spring out of your data file are not many, but the very size of a typical data file, numbering hundreds of entries, makes it very important to think in advance about the potential discrepancies that might arise when importing your data into LEXUS. There is however a very simple solution to all of them, an idea that should in any case underlie any lexical enterprise: consistency.
In the previous section we have stressed the importance of having a clear idea of how you want to organize your markers. Once that has been accomplished it is crucial to stick to that order in your data file. Having said that, it is clear that often such order appears in your data as time goes by, and more and more information about the lexicon is available. For Toolbox it does not make much difference, whether you stick to the hierarchy, as the markers you want are simply listed for every entry separately.
For LEXUS, however, it can be of great importance. The problems begin when any of your entries contains a string of markers and their values that is against the hierarchy defined in the .typ file. Let us come back to the example. In Figure 3.1 the structure of the .typ file is presented in the black box and the relevant part of the entry follows. As we can see they both follow the same pattern:
That is the translations and the sound file are placed under the appropriate example in the vernacular. If all the entries follow this order there will not be any problems. Notice that the order of and does not matter - this is because they are all defined under the in the structure, and as long as they follow the in the data file, their ordering is of no relevance. However, let us say that there is an entry in your data file in which you have typed the markers and their values in a different way. In Figure 3.2 again the structure of the .typ file is shown in the box and the relevant part of the entry:
On the face of it, it seems not to matter that much. After all it is still clear that the sound file refers to that particular example. However, you have to be aware how LEXUS will treat such an entry. LEXUS reads the entries linearly, line after line, and fills the structure that the .typ file has provided with information from the data file. Whenever it encounters a marker that has a certain value, it checks under which marker this marker was defined in the .typ file. Then it looks back through the part of the entry that has already been created in LEXUS to see whether this higher marker has already appeared in the structure or not. If it has then the currently analyzed marker will be simply linked under it. Let us assume that the .typ file and the data file follow the same structure and that in our structure file is linked under rf, . Let us also assume that rf has already appeared and LEXUS has created a node for it.
If the situation we are dealing with is such as shown in Figure 3.1 the steps LEXUS will take are the following. When encountering in the data file it will check in the structure file where it should be linked – the answer is under As already exists in the structure of this entry, will be linked somewhere under Let us remind also that has other nodes linked under it in the structure file. Therefore, a group node will be created out of it first () and linked under and itself will be linked under that group node. The next marker that LEXUS will encounter will be . Again LEXUS will check if the marker that in the hierarchy is above it () already exists in the structure. It does, as it has just been created. In that case will be linked under it. This operation will be repeated until all the relevant markers will be linked under . As a result, the following structure will be created in LEXUS for that entry:
This is what we would like to have in our lexicon. However, if the order of the markers in a particular entry is not as in the .typ file, that is we are dealing with the situation presented in Figure 3.2, the result will be different. LEXUS will first encounterand not. Then again it will check under which node is defined in the structure file. We already know that it is defined under . However, has not appeared in this entry yet, as xv is placed later, after . LEXUS therefore will create an, and will link under that group node. After that it will encounter and thus create another with and linked under it. In the end the structure that LEXUS will create will be the following:
This can be very problematic, as the information that this sound file () goes with these translations () is now lost – it is distributed between two different : one missing a sound file, the other missing the translation information. Such situations will happen always if a marker that is higher in the hierarchy in the structure file, appears in the data file lower than the markers that are defined under it. If you want to import your data into LEXUS, it has to be made sure that such situations do not appear in your data file. As a practical guideline we suggest therefore to stick to the hierarchy of markers from your .typ file in your data file. This means placing a maker right under the marker under which it was defined, and never above it. If, however, certain entries do not conform to the structure of the .typ file, they have to be changed manually. This means simply changing the order of the markers for that particular entry in the data file.
Importantly, there is one partial exception to the rule by which LEXUS reads the data file. Of course not all markers will be used in case of every entry and it would be a waste of time to list all the unnecessary markers just for the sake of keeping to the hierarchy. But LEXUS reads only the markers that are used in a particular entry, and makes a structure out of them according to the .typ file.
Secondly, what LEXUS does not see, it does not include. That is to say, it is not enough for a marker to appear in an entry for LEXUS to read it. It also has to have a value to be recognised by LEXUS. Whenever you have a marker that has no value this marker will be simply omitted.
This can work to your advantage: LEXUS will always create a coherent minimal structure out of the lexical entry according to the .typ pattern. Therefore an empty marker can never lead to the afore-mentioned complications. There remains the question however, why to keep empty markers.