Configuring the program

Format of the configuration file

The configuration file follows many other Unix and Windows configuration files in that:

Comments are prefixed by #, and anything from the # to the end of the line is ignored.
Blank lines are ignored.
The rest is a number of "key = value" pairs.
The keys are pre-defined (see below).
The values are either "quote-enclosed strings" (e.g., "C:\Emdros\mymap.map") or consist of letters, numbers, underscores, and/or dots, optionally followed by a "quote-enclosed string" (e.g., 'word.surface', or 'word.surface."C:\Documents and Settings\Administrator\teckitmap.map"').

When a value has dots that are not enclosed in "quotes", then the strings on either side of the dots are interpreted as subkeys. For example, the value "word.surface" represents the subkey "word" with the value "surface", and the value "word.surface."/home/myname/Blah.map" represents the subkey "word" with the subsubkey "surface", followed by the value "/home/myname/Blah.map".

Here is a sample configuration file, explained bit by bit:

Database selection

# database
database = mydb

You can specify a database that is always to be used with this configuration file.

If using SQLite 2 or SQLite 3, you may wish to specify a path. Do so in quotes:

# Database path. You can place it anywhere you want, so long
# as you abide by the rules of your operating system. For
# example, on Windows, do not place any "changing" data,
# such as an Emdros database, underneath 
# C:\Program Files\.
database = "C:\Users\yourusername\Documents\Emdros\mydb.sqlite3"

Data unit

# data unit
# There can only be one data unit
# but it can have as many data_features as you like.
# Each data_feature will go on its own interlinear line.
# 

data_unit            = word
data_feature         = graphical_word
data_feature         = graphical_lexeme

The data unit is the basic unit that will result in one box in the chunking area. They can be any object type, and need not be words. However, probably you want them to be words or word-like objects. It depends on how large segments you want to be able to chunk at a time.

You must specify which feature(s) to display for the data unit.

There can only be one data unit.

TECkit mappings

# TECKit
#
# data_feature_teckit_mapping defines what TECkit map to use
# for a given data_feature.
#
# data_feature_teckit_in_encoding specifies the in_encoding ("bytes" 
# or "unicode") for the given data_feature.
#
# data_feature_teckit_out_encoding specifies the out_encoding ("bytes"
# or "unicode") for the given data_feature.
# 
data_feature_teckit_mapping      = graphical_word."Amsterdam.map"
data_feature_teckit_in_encoding  = graphical_word.bytes
data_feature_teckit_out_encoding = graphical_word.unicode

TECkit is a tool made by SIL International. It converts between encodings, in particular to and from Unicode. The Emdros Chunking Tool incorporates TECkit, and you can apply it to any textual feature of any object type.

TECkit works with a so-called "map file" -- a text file which you or someone else writes. More information about writing TECkit mappings can be found on SIL's website:

http://scripts.sil.org/TECkit/

The Emdros Chunking Tool needs three pieces of information in order for TECkit to work on a particular feature:

The name of the file which holds the maping. This is given with the key "data_feature_teckit_mapping".
The input encoding (encoding of the feature-string): This is given with the key "data_feature_teckit_in_encoding". The value can be either "bytes" or "unicode" (without the quotes). "bytes" means that TECkit does not convert to UTF-8. "unicode" means it is converted to UTF-8 for display. You should use whatever is used in the map file for input encoding here.
The output encoding (encoding to transform into): This is given with the key "data_feature_teckit_out_encoding". The same meanings and restrictions apply as for the input encoding.

TECkit can not only convert between encodings, but also remove stuff from a string. This can come in handy when you have characters in your feature-strings which you do not wish to display. Again, see the TECkit site on SIL's website for information on how to write a TECkit mapping.

You should give first the object type, then a dot, then the feature-name, then a dot, then the full path to the map file. You probably need to enclose the path in "double quotes".

You can only have one TECkit per feature.

Options

# Options
#
# The only option available is 'right_to_left', which, if set,
# will cause the chunking area to run right to left rather than
# left to right.
option = right_to_left

Display options

# Fonts -- chunking area font names.
# If you give more than one chunking_area_font_name,
# they will be assigned to individual data_feature interlinear
# lines, in the same order as the data_feature keys appear.
#
# If you give less keys here than you have data_feature keys,
# then the last one will be used for the ones that aren't assigned
# an explicit value.
#
# If you give no values for this key, then some sensible default
# font will be used.
#
chunking_area_font_name = "Ezra SIL"
chunking_area_font_name = "Courier"
chunking_area_font_name = "Ezra SIL"

#
# The magnification (in percent) of the chunking area.
# 100 corresponds approximately to a font size of 12 points.
#
chunking_area_magnification  = 120