emdros: A Corpus Query System. What is Emdros?

Emdros - the database engine for analyzed or annotated text

What is Emdros?

Emdros is an Open-Source text database engine for storage and retrieval of analyzed or annotated text.

It has a powerful query-language for asking relevant questions of the data.

Where is it applicable?

Emdros has wide applicability in fields that deal with analyzed or annotated text. Application domains include linguistics, content distribution (mobile, desktop), publishing, text processing, and any other fields that deal with annotated text.

What does it do well?

Linguistic analyses are a primary target domain. This includes all levels of analysis, such as morphology, syntax, and discourse analysis, and even phonology to some extent.
Content distribution, involving shipping databases of content to users' devices (mobile, desktop) is also a primary target domain. Emdros's proprietary BPT engine supports encryption of content, as well as very small size of shipped databases.
Publishing and document repository systems can also make profitable use of Emdros. The engine supports storage and annotation of text at any level of granularity.
Text processing may benefit from Emdros if the problem involves annotating the text.

Emdros provides a conceptual model of text which can be quite liberating to use once it has been grasped.

Meta-data may also be stored, so long as there is some textual element with which it can be associated.

Emdros is good both for corpus linguistics (large amounts of text) and for field-linguistics (smaller amounts of data).

Fixed corpora, such as Biblical texts, are good candidates for making Emdros useful. Emdros is currently being used for large databases of the Hebrew Bible and several Greek New Testaments that have been analyzed linguistically.

Dictionaries are also a target possibility. Emdros supports structuring of text documents down to minute details, while not losing the big picture.

Emdros and XML

Emdros embodies a particular model of text called the EMdF model. The primary advantage over XML's data model is that object types (such as pages and chapters) need not be hierarchically structured or embedded, but may overlap. In addition, objects (such as a clause or a phrase) need not be contiguous, but may have gaps.

Emdros supports easy playback of databases into whatever format is required (XML, (X)HTML, JSON, YAML, LaTeX, etc.) using scriptable style sheets that take care of what to with each kind of object in the database.

Emdros's mql program can output its results in XML. The XML carries its own standalone DTD and validates with a validating parser.

More information

You can get more information about:

There is a short paper giving an overview of Emdros. Apart from the website, this really should be your first stop:

Emdros overview-paper from COLING 2004.

There is also some:

Documentation

Who is behind this?

Crist-Jan Doedens developed MdF and QL. These two underlie Emdros.

Ulrik Sandborg-Petersen developed EMdF and MQL, and wrote Emdros from scratch.