What is a HAL space?

Background

HAL stands for:

Hyperspace Analogue to Language

It was invented by a research group at University of California at Riverside. It has a homepage here.

HAL is a numeric method for analysing text. It does so by running a sliding window of fixed length across a text, calculating a matrix of word cooccurrence values along the way.

Informal definition

A HAL space is an QxQ matrix of integers, where Q is the number of distinct word-forms in the text. For example, if the text contains 50,000 running words, of which 6,000 are unique forms, then the corresponding HAL space will be a 6,000x6,000 matrix.

Each entry in the matrix is a sum of all values arising from running the sliding window over the text.

The sliding window is n words wide, e.g., 8.

Then, for a given word, say, the one at index t, a value is added to the matrix for each of the word pairs which arise by pairing the word at index

t

with each of the words at indexes

t-n to t-1

.

This basically means the n words before the one at index t are paired with the word at index t.

For each of these word-pairs, a value is calculated as follows: If the word before t is at index j (e.g., t-3), then the value is

n - |t-j| + 1

For example (t=10, j=10-3=7):

8 - |10 - 7| + 1 = 6

This basically means that words close to t get a higher score than words farther away.

If the word-form corresponding to t has number p in the HAL-matrix, and the word-form corresponding to j has number q in the HAL-matrix, then this value is added to the p'th row and the q'th column.

Mathematical definition

Assume or define the following:

  • Assume that the words are numbered 1..M.
  • The words themselves will be called T(1), T(2), ..., T(M) for easy reference.
  • Assume that there are Q unique forms. Then the forms will be called F(0), F(1), F(2), ..., F(Q-1).
  • Assume that the function Form(n) maps word-indexes in 1..M to form-indexes in 0..Q-1. For example, if T(2) has the form F(10), then Form(2) = 10.
  • Assume that the window-width is n.
  • Define the delta function D(a,b) as follows: D(a,b) = n - |a-b| + 1
  • Assume that Matrix[0..Q-1][0..Q-1] is the HAL matrix, and that all values are initially 0.

The HAL space is calculated as follows:

  1. Initialize all matrix-cells to be 0.
  2. For each word-index i in the range 2..M:
    1. For each word-index i2 in the range i-n..i-1 (the lower bound of this range must be adjusted if there is no such word, e.g., if n=4 and i is 2, the first value will be 1, not -2):
      1. Let d = D(i2,i)
      2. Add d to the HAL matrix's Form(i)'th row and Form(i2)'th column. I.e., add d to the existing value of Matrix[Form(i)][Form(i2)].

Previous:Part I: Preliminaries
Up:Part I: Preliminaries
Next:What does the example do?