What does the example do?

Overview

The example can do these things:

  1. It can load an Emdros database with a text, initializing the database for use with the example.
  2. It can build HAL-spaces of the in the database
  3. It can load a previously generated HAL space from the database
  4. It can emit two kinds of output:
    1. A complete HAL-space as a Comma-Separated-Value (CSV) text file suitable for loading into a spreadsheet.
    2. Certain parts of a HAL-space, based on certain word-forms that are of interest.

What output does it give?

  • For building the database:
    • A loaded Emdros database
    • A list of word-forms found in the text (the Q word forms spoken of on the HAL space page).
  • For querying the database:
    • Optionally, a Comma-Separated Value (CSV) file containing the entire HAL-space, suitable for loading into a spreadsheet.

    • An output file with data for words which you are specially interested in.

      For each word, a list is given of the words with which it cooccurs most frequently and closely.

      If the word form you are interested in is w1 and the word form with which it cooccurs is w2, then the score is calculated as Matrix[w1][w2] + Matrix[w2][w1].

      This score is printed twice: First, it is printed in a scaled form. The score is multiplied by a user-specified factor and divided by the text length. And second, it is printed in its raw form, as it came from the matrix.

      The list is sorted, so that the "heaviest" words come first. The user can specify how many words to put in the list.

What input does it need?

  1. For loading the database: Only a text file.
  2. For querying the database: A configuration file with a special format.

What is a configuration?

A configuration file is a plain text file which looks like a Unix configuration file, and holds information necessary for running the HAL example. See the later page for its format.

What information does an input file contain?

An input file contains the following:

  • The database name which holds the text to analyze.
  • The sliding window width, n.
  • The name of the CSV-file containing the HAL-space in a CSV-format, suitable for reading into a spreadsheet. ("none" if you don't want a CSV file).
  • The name of the output file containing the output for the words you are interested in.
  • The words you are interested in.
  • The maximum number of values for each word you are interested in.
  • The factor by which to multiply each value for a given word. This is first divided by the number of words in the text. This can come in handy if you wish to compare texts of different lengths.

Previous:What is a HAL space?
Up:Part I: Preliminaries
Next:Part II: Running