Import

What's this?

Say you have some linguistic data -- for example, a corpus made by others, or some data you've collected. Let's call the representation you have already, the external representation.

You want to import it into an Emdros database for easy searching and/or update.

How do I import linguistic data into Emdros?

Well, there are several strategies, but they all assume you've got a database schema figured out.

This is because you need a clear way of mapping the data to your EMdF data model. In other words, you need to know exactly how each part of your data maps into the EMdF database model.

Once you've figured that out, here are some variations over the theme of importing data. They all assume that the database has been created using the EMdF database schema script which you should have created.

  1. Read your data into memory, processing on the fly, creating MQL statements which you store in a file for later import using the mql(.exe) program. A scripting language such as Python, Ruby, Perl, AWK, or a number of others are good for this purpose.

  2. Same as (1), but with the MQL statements piped directly to the mql(.exe) program without storing in a file.

  3. Read your data into memory, and use either the Emdros C++ libraries to issue the MQL statements directly to a live database connection, thus feeding the data into the database on-the-fly through the Emdros libraries.

  4. Same as (3), but using one of the SWIG bindings (at the time of writing, Java, Perl, Python, Ruby, and C#/.Net are supported).

How do I read my data into memory?

Well, you probably need to create some kind of in-memory representation of the data, as an intermediate between the external representation (say, in XML) and the MQL statements which you will emit.

Object-oriented programming languages are easy to use here, because you can wrap each EMdF object in a language-specific object and give it methods to read from the external representation, to emit MQL, etc.

Often, however, you can probably make do with something like Python's dictionary, or Perl's or AWK's associate array.


Previous:Strategies
Up:Strategies
Next:Create on-the-fly