Contents:
Configuring the program
Search: |
Configuring the programFormat of the configuration fileThe configuration file follows many other Unix and Windows configuration files in that:
When a value has dots that are not enclosed in "quotes", then the strings on either side of the dots are interpreted as subkeys. For example, the value "word.surface" represents the subkey "word" with the value "surface", and the value "word.surface."/home/myname/Blah.map" represents the subkey "word" with the subsubkey "surface", followed by the value "/home/myname/Blah.map". Here is a sample configuration file, explained bit by bit: Database selection# database database = mydb You can specify a database that is always to be used with this configuration file. If using SQLite 2 or SQLite 3, you may wish to specify a path. Do so in quotes: # Database path. You can place it anywhere you want, so long # as you abide by the rules of your operating system. For # example, on Windows, do not place any "changing" data, # such as an Emdros database, underneath # C:\Program Files\. database = "C:\Users\yourusername\Documents\Emdros\mydb.sqlite3" Data unit# data unit # There can only be one data unit # but it can have as many data_features as you like. # Each data_feature will go on its own interlinear line. # data_unit = word data_feature = graphical_word data_feature = graphical_lexeme The data unit is the basic unit that will result in one box in the chunking area. They can be any object type, and need not be words. However, probably you want them to be words or word-like objects. It depends on how large segments you want to be able to chunk at a time. You must specify which feature(s) to display for the data unit. There can only be one data unit. TECkit mappings# TECKit # # data_feature_teckit_mapping defines what TECkit map to use # for a given data_feature. # # data_feature_teckit_in_encoding specifies the in_encoding ("bytes" # or "unicode") for the given data_feature. # # data_feature_teckit_out_encoding specifies the out_encoding ("bytes" # or "unicode") for the given data_feature. # data_feature_teckit_mapping = graphical_word."Amsterdam.map" data_feature_teckit_in_encoding = graphical_word.bytes data_feature_teckit_out_encoding = graphical_word.unicode TECkit is a tool made by SIL International. It converts between encodings, in particular to and from Unicode. The Emdros Chunking Tool incorporates TECkit, and you can apply it to any textual feature of any object type. TECkit works with a so-called "map file" -- a text file which you or someone else writes. More information about writing TECkit mappings can be found on SIL's website: http://scripts.sil.org/TECkit/ The Emdros Chunking Tool needs three pieces of information in order for TECkit to work on a particular feature:
TECkit can not only convert between encodings, but also remove stuff from a string. This can come in handy when you have characters in your feature-strings which you do not wish to display. Again, see the TECkit site on SIL's website for information on how to write a TECkit mapping. You should give first the object type, then a dot, then the feature-name, then a dot, then the full path to the map file. You probably need to enclose the path in "double quotes". You can only have one TECkit per feature. Options# Options # # The only option available is 'right_to_left', which, if set, # will cause the chunking area to run right to left rather than # left to right. option = right_to_left Display options# Fonts -- chunking area font names. # If you give more than one chunking_area_font_name, # they will be assigned to individual data_feature interlinear # lines, in the same order as the data_feature keys appear. # # If you give less keys here than you have data_feature keys, # then the last one will be used for the ones that aren't assigned # an explicit value. # # If you give no values for this key, then some sensible default # font will be used. # chunking_area_font_name = "Ezra SIL" chunking_area_font_name = "Courier" chunking_area_font_name = "Ezra SIL" # # The magnification (in percent) of the chunking area. # 100 corresponds approximately to a font size of 12 points. # chunking_area_magnification = 120 |