emdros: A Corpus Query System. Explaining Emdros

Emdros - the database engine for analyzed or annotated text

Explaining Emdros

Emdros is a text database engine. Below we explain the three parts of this definition in turn. Other useful information will follow after that.

Emdros is an engine

Emdros is an engine. That is, emdros is middle-ware. You must write a client which takes advantage of Emdros, or use one of the clients provided. Emdros, in turn, rests on an underlying database engine such as PostgreSQL or MySQL, or its own proprietary storage layer, the BPT engine.

Thus Emdros by itself may or may not be useful -- you must usually write a client on top of Emdros for it to be useful in your application domain, unless one of the clients provided suits your needs.

Emdros is a database engine

Emdros provides database services. The MQL Query language allows you to create, update, delete, and query the data you need to manipulate.

For example, it provides a means of creating text objects such as words, phrases, and clauses. It provides a means of updating those objects. It provides a means of deleting them again. And it provides powerful ways of finding them again in interesting environments.

Emdros is a text database engine

The data domain which Emdros handles is that of text. Emdros provides a certain abstraction of text that makes it ideally suited to storing and retrieving annotated text, such as linguistic analyses of a text, or document markup.

The kind of analyses which can be stored is open-ended, and is not dictated by the database model. The database model is generically suited to storing "text plus information about that text", and thus allows you, as the application designer, to write your own "data model", deciding which kinds of analyses you want in your data-domain.

For example, Emdros does not dictate the kind of linguistic analyses which can be placed into the database. Therefore, analyses can be, e.g., syntactic analyses, morphological analyses, or discourse analyses, or all of these simultaneously in the same database.

Emdros is particularly useful in domains where research questions need to be asked of databases of annotated text. This would include e-humanities, dictionary-making, Biblical language-research (Greek or Hebrew), other linguistic research, and research on annotated text in general.

Emdros has a particular model of text called the EMdF model. Users have attested, and our experience shows, that the EMdF model can be quite liberating when dealing with text as a programmer or database designer. Thus any application that deals with annotated text will likely benefit from the Emdros and the EMdF model.

Emdros architecture

Emdros fits into a software architecture as follows:

+---------------+
|    Client     | User-written / a few clients ship with Emdros
+---------------+
       |
+---------------+
|     MQL       | Emdros
+---------------+
       |
+---------------+
|     EMdF      | Emdros
+---------------+
       |
+---------------+
|      DB       | PostgreSQL or MySQL
+---------------+

At the top, there is a client. Emdros ships with a number of clients, but if none of them suit your application domain, then you, the user, must write a client to make Emdros useful. This client will take advantage of Emdros's services to provide for the needs of your particlar database domain.

Then come the two Emdros-layers: The MQL layer and the EMdF layer. The MQL layer provides an interface to the MQL language. The MQL layer automatically takes advantage of the EMdF layer, which translates the MQL queries into SQL calls to the underlying database, or calls into the proprietary BPT engine.

The underlying database takes care of storing the data, and retrieving it as directed by the EMdF layer.

Emdros overview-paper

A quick overview of Emdros can be found in my short paper given at COLING 2004. It should be one of your first stops after this website for information on Emdros.