Configuring the program
Configuring the program
Format of the configuration file
The configuration file follows many other Unix and Windows configuration files in that:
When a value has dots that are not enclosed in "quotes", then the strings on either side of the dots are interpreted as subkeys. For example, the value "word.surface" represents the subkey "word" with the value "surface", and the value "word.surface."/home/myname/Blah.map" represents the subkey "word" with the subsubkey "surface", followed by the value "/home/myname/Blah.map".
Here is a sample configuration file, explained bit by bit:
# database database = mydb
You can specify a database that is always to be used with this configuration file (unless overridden with the -d switch to eqtc).
If using SQLite 2 or SQLite 3, you may wish to specify a path. Do so in quotes:
# Database path. You can place it anywhere you want, so long # as you abide by the rules of your operating system. For # example, on Windows, do not place any "changing" data, # such as an Emdros database, underneath # C:\Program Files\. database = "C:\Users\yourusername\Documents\Emdros\mydb.sqlite3"
# rasterising unit raster_unit = clause
The Emdros Query Tool operates with a notion of "rasterising unit". That is the unit to be displayed on one line. For example, if your query returns a bunch of words, then, in the example above, all clauses that contains at least one of the words will be fetched and displayed.
There can only be one rasterising unit.
# raster context raster_context_before = 10 raster_context_after = 10The "raster_unit" can be replaced with "so many monads of context" (before and after a hit). If a raster_unit is specified, it will take priority. If a raster unit is not specified, then both of the raster_context_before / raster_context_after values must be present.
# data units data_unit = clause data_unit = phrase data_unit = word data_feature = word.surface data_feature = word.psp data_feature = phrase.phrase_type data_feature = phrase.function # You can have more than one data_unit_name = clause."Cl" data_left_boundary = phrase.OPEN_BRACKET # Specifies left boundary marker data_right_boundary = phrase.CLOSE_BRACKET # Specifies right boundary marker
The data units are the units to be displayed in each rasterising line. They can be anything, and need not be words.
You must specify which feature(s) to display for each data unit. The feature-names must be prefixed with the name of the data unit plus a dot, as in the example above.
The capitalisation must be exactly the same as the value for the "data_unit" key. For example, if you said "data_unit = phrase", then you must also say "data_feature = phrase.phrase_type", not "Phrase.phrase_type".
There can be more than one data unit. If so, they should be specified in the order from largest to smallest (e.g., clause, phrase, word). This will give the "output" output style (see below) a hint as to how to print things in the right order.
You can optionally specify "boundary markers" that will be printed at the left and right boundaries of a unit respectively. The strings to be printed can be taken from the following table:
The "data_unit_name" key gives, for a given object type, a string which will appear above all the other data_features (if any). In the above example, the clause unit is given a "Cl" label.
Finally, in the graphical version of the Emdros Query Tool, it is possible to have an interlinear display. The order of the lines in the interlinear display is the same as the data_feature keys. The number of lines is equal to the number of features for the data unit for which the most data_feature keys are given, plus the number of data_unit_name keys for that unit.
#surface data_feature_teckit_mapping = word.surface."e:\TECkit\mymap.map" data_feature_teckit_in_encoding = word.surface.bytes data_feature_teckit_out_encoding = word.surface.unicode # lemma data_feature_teckit_mapping = word.lemma."e:\TECkit\mymap.map" data_feature_teckit_in_encoding = word.lemma.bytes data_feature_teckit_out_encoding = word.lemma.unicode
TECkit is a tool made by SIL International. It converts between encodings, in particular to and from Unicode. The Emdros Query Tool incorporates TECkit, and you can apply it to any textual feature of any object type.
TECkit works with a so-called "map file" -- a text file which you or someone else writes. More information about writing TECkit mappings can be found on SIL's website:
The Emdros Query Tool needs three pieces of information in order for TECkit to work on a particular feature:
TECkit can not only convert between encodings, but also remove stuff from a string. This can come in handy when you have characters in your feature-strings which you do not wish to display. Again, see the TECkit site on SIL's website for information on how to write a TECkit mapping.
You should give first the object type, then a dot, then the feature-name, then a dot, then the full path to the map file. You probably need to enclose the path in "double quotes".
You can only have one TECkit per feature.
# reference units reference_unit = verse reference_feature = verse.book reference_feature = verse.chapter reference_feature = verse.verse reference_sep = SPACE # between book and chapter reference_sep = COMMA # between chapter and verse
If you have a unit in your database which somehow identifies the position in the document, or an ID, you can display these units at the left of each line. The canonical example is the Biblical system of book-chapter-verse, but in many corpora, there will be a unit identifying, e.g., which newspaper article something came from.
In the above example, verse is the reference unit, and three features are fetched, namely book, chapter, verse. The order in which they are specified in the configuration file is the order in which they will be emitted.
If there is more than one reference unit feature, you must specify the separators to separate them. In the above example "SPACE" will be emitted between "book" and "chapter", and "COMMA" will be emitted between the chapter and the verse (again, the order matters). See the table above for some possibilities of using special characters.
There can be only one reference unit.
#output_style = kwic #output_style = tree #output_style = xml output_style = output
Specifies which implementation to use for emitting solutions. Currently, three kinds of output style are implemented:
Data tree parent
# Tree parent feature. # If output_style = tree, then it is assumed that # there is a feature on all relevant data units which gives the # id_d of the parent. That is, each child node in the tree # must have a feature which provides the id_d of its parent. # If a data_unit is provided which does not have a data_tree_parent, # then that data_unit *must* contain the top-most nodes in the tree. data_tree_parent = clause.parent data_tree_parent = phrase.parent data_tree_parent = word.parent
If "output_style" is set to "tree", then this option specifies, for each terminal and non-terminal in the tree, what feature gives the parent of the node. Note that this feature must have type "id_d", and the value must point to the id_d of the parent node.
Tree terminal unit
# Tree terminal unit. # If output_style = tree, then the Emdros Query Tool needs to know # which object types are terminals (i.e., leaf nodes in the tree) # and which object types are non-terminals. This is done by # designating *one* (1) data_unit to be the data_tree_terminal_unit. # The rest of the data_units will then be non-terminals. data_tree_terminal_unit = word
This options tells the tree layout code which data_unit contains the terminals. Note that the Emdros Query Tool assumes that terminals and nonterminals are different object types. There may be more than one nonterminal object type, but only one terminal object type. The non-terminsl object types are determined based on the data_unit option.
Object Type Name as Tree node name
# Tree nonterminal unit name. # If and only if this is set to "true" (without the quotes), # use the object type name as the node name in the tree. # # otherwise, it is advisable to add data_feature entries for all # nonterminal units, which will then be shown. # # You can set this to "true" and still use data_feature -- the features # will then be addedbelow the object type name. # data_tree_object_type_name_for_nonterminals = true
This options tells the tree layout code to add the object type name of each nonterminal as the first line in each node box. If set to "true", this is what is done. If set to anything else, or if not set, it is not done.
# hit type # hit_ must be one of: # focus # innermost # innermost_focus # outermost hit_type = outermost
The hit type determines how the sheaf is interpreted. There are four available options:
If none of these are specified, then "outermost" is assumed as the default.
# display options option = apply_focus option = break_after_raster option = quiet option = single_raster_units
You can have these options:
input_area_font_name = "Arial MS Unicode" input_area_font_size = 11 # in points output_area_font_name_1 = "SPIonic" output_area_font_name_2 = "Courier New" output_area_font_name_3 = "Times New Roman" output_area_magnification = 100 # in percent (%)
You can set the default font name and font size (in points) for the input area.
You cannot set the font size in points for the output area. Instead, you can set it to a percentage of 12 point. For example, setting output_area_magnification to 150 will select a font size of 18 points, and setting it to 200 will select a font size of 24 points.