Contents:
Search: |
What is a HAL space?BackgroundHAL stands for: Hyperspace Analogue to Language It was invented by a research group at University of California at Riverside. It has a homepage here. HAL is a numeric method for analysing text. It does so by running a sliding window of fixed length across a text, calculating a matrix of word cooccurrence values along the way. Informal definitionA HAL space is an QxQ matrix of integers, where Q is the number of distinct word-forms in the text. For example, if the text contains 50,000 running words, of which 6,000 are unique forms, then the corresponding HAL space will be a 6,000x6,000 matrix. Each entry in the matrix is a sum of all values arising from running the sliding window over the text. The sliding window is n words wide, e.g., 8. Then, for a given word, say, the one at index t, a value is added to the matrix for each of the word pairs which arise by pairing the word at index t with each of the words at indexes t-n to t-1 .This basically means the n words before the one at index t are paired with the word at index t. For each of these word-pairs, a value is calculated as follows: If the word before t is at index j (e.g., t-3), then the value is n - |t-j| + 1 For example (t=10, j=10-3=7): 8 - |10 - 7| + 1 = 6 This basically means that words close to t get a higher score than words farther away. If the word-form corresponding to t has number p in the HAL-matrix, and the word-form corresponding to j has number q in the HAL-matrix, then this value is added to the p'th row and the q'th column. Mathematical definitionAssume or define the following:
The HAL space is calculated as follows:
|