How to build shared libraries

Attribution

The following was contributed by Prof. Dr. Kirk E. Lowery, klowery{at}grovescenterPARIS.edu. (remove French city).

Shared Library Fundamentals

Libraries first came about when complex programs organized function calls in logical groupings. With the advent of GUIs and object-oriented programming, functions began to be designed for use by many programs, and so needed grouping conveniently together into one file of object modules produced by the compiler.

These libraries are called "static" libraries and, on Linux systems, usually carry the "a" extension, e.g., libemdf.a. A programmer wishing to use the functions in a particular library uses the syntax of the function call documented in the header file, e.g., monads.h, called the "Application Program Interface (API)". At link time, the linker makes a copy of the desired functions binary code, and includes it in the final executable binary of the application program. Such a program is called "statically" linked, and is independent of the presence of the original static library. The program can be used on any compatible system whether or not the static library is present, since the program carries all needed code within itself.

As operating systems became more complex, and especially with the appearance of client-server technology, hard disks and RAM became cluttered with many copies of the same functions. Application program binaries were increasing in size by whole orders of magnitude. In order to improve effeciency the concept of "shared" libraries was created.

The idea of a shared library is simple: code that is used by more than one application needs to reside only *once* on the hard disk and in memory. And it does not need to be loaded into memory until called for. It can even be removed from memory if deemed necessary.

For an application such as Emdros, the necessity for and advantages of shared libraries are obvious. Emdros is intended to work with databases, standing in between a client application and the database server itself. Such an environment is inherently multi-tasking and multi-user. The more basic of emdros' function calls (e.g., the creation and manipulation of monads, or mql queries) will be used thousands of time during the session of one user. We don't want thousands of copies of dozens of functions filling up memory, and once the overhead of calling a function and copying it into memory is paid, we don't want to have to pay it over and over again.

Using and Linking with Shared Libraries

Let's use the example of Emdros' own application program that uses the Emdros libraries: mql (or, under Windows: mql.exe). Here is a code snippet from mql.cpp where an emdros library function is used:

// Make EMdFOutput
EMdFOutput *pOut = new EMdFOutput(kCSISO_8859_1, &std::cout, output_kind);

A new instance of the object EMdFOutput is created here and given the name pOut. Accompanying this object are a set of methods (or, functions) which can be used with this object. We don't need to know the details of how EMdFOutput does its magic. All we need to know is found in emdf_output.h:

class EMdFOutput {
protected:
  eCharsets m_charset;
  eOutputKind m_output_kind;
  std::ostream *m_pStream;
  int m_current_indent;
  int m_indent_chars;
public:
  EMdFOutput(eCharsets charset, std::ostream *pStream, 
              eOutputKind output_kind, int indent_chars = 3);
  ~EMdFOutput();
  // Getting
  eOutputKind getOutputKind(void) const { return m_output_kind; }
  // Output
  void increaseIndent();
  void decreaseIndent();
  void out(std::string s);
  void outCharData(std::string s);
  void newline();
  void flush() { *m_pStream << std::flush; };
  // XML members
  void printXMLDecl();
  void printDTDstart(std::string root_element);
  void printDTDend();
  void startTag(std::string name, bool newline_before = false);
  // Must have pairs of (attribute name, attribute value)
  void startTag(std::string name, const AttributePairList& 
                 attributes, bool newline_before = false); 
  // for tags of type <tag/>
  void startSingleTag(std::string name, bool newline_before = false); 
  // for tags of type <tag/>
  void startSingleTag(std::string name, const AttributePairList& 
                 attributes, bool newline_before = false); 
  void endTag(std::string name, bool newline_before = false);
protected:
  void emitAttributes(const AttributePairList& attributes);
};

This API tells us all we need in order to create instances of objects of this class, what information we need to give to it, what kind of information we can expect from it, and what functions we can use to manipulate it.

In order to compile the mql application program we issue the command:

g++ -g -o .libs/mql mql.o  -L/usr/local/src/emdros/EMdF \
                           -L/usr/local/src/emdros/MQL \
                           -L/usr/local/src/emdros/pcre \
          /usr/local/src/emdros/MQL/.libs/libmql.so -lpcre_emdros \
          /usr/local/src/emdros/EMdF/.libs/libemdf.so \
             -lpq -Wl,--rpath -Wl,/usr/local/lib/emdros

The compiler is told the name of the object output file (mql.o), where to place it in the source tree before installation (.libs/mql), and to include debugging information (-g). The "-L" option tells g++ where to find libraries to link to, and the "-l" option tells it which libraries it is to link against, looking for a "libpcre_emdros.a" in the case of "-lpcre_emdros". This is correct, since the pcre library is not compiled as shared, but are linked statically into library functions that need it. In the case of the other libraries, the ".so" extention (Shared Objects) tells the linker that we want to link against these "dynamically," not "statically". "pq" stands for the "postgresql" libraries, which is the database backend chosen for this installation of emdros. "--rpath" is the "runtime path" that will be used to find the shared libraries as the program is executing.

If you are using the GNU Autobuild tools (autoconf, automake, libtool) -- and if you are not, you should be! -- then the following lines in your Makefile.am will generate the above command, assuming you have properly created the top-level configure.in file:

bin_PROGRAMS = mql
mql_LDADD = -L../EMdF -L../MQL -L../pcre -lmql @EMDFLDADD@
mql_DEPENDENCIES = @EMDFDEPS@ ../pcre/libpcre_emdros.a ../MQL/libmql.la
mql_SOURCES = mql.cpp
INCLUDES = -I../include
CLEANFILES = *~ core .deps/*
AM_CXXFLAGS = @CXXFLAGS@ @DEBUGFLAG@

Note how libmql is to be linked as a shared library so it is listed as "libmql.la" whereas libpcre_emdros is listed as "libpcre_emdros.a" for static linking.

It is beyond the scope of this HOWTO to get into all the details of the Autobuild tools, but the reader is referred not just to the excellent documentation that comes along with the software, but also to the highly recommended tutorial "The AutoBook"

<http://sources.redhat.com/autobook/>

Making Your Own Shared Library

Creating a shared library is pretty straightforward. But I am not going to talk about how to do it manually without using the Autobuild tools, particularly libtool. Unless your situation is trivially simple, libtool makes your life so much easier. I am going to use my experience in modifying the emdros distribution to build shared libraries in addition to the static ones. To make matters even more simple, I will concentrate on just one library: libemdf.

There are two files that concern us in the task. First is configure.in in the top-level emdros directory. The second is EMdF/Makefile.am. I will note the elements needed for building the libraries only.

In configure.in, we need the following:

dnl Library versioning
dnl We begin with 0:0:0
LIB_CURRENT=0
LIB_REVISION=0
LIB_AGE=0
AC_SUBST(LIB_CURRENT)
AC_SUBST(LIB_REVISION)
AC_SUBST(LIB_AGE)


dnl Invoke libtool
AC_PROG_LIBTOOL


dnl
dnl Set EMDFLDADD and EMDFDEPS
dnl
if test "x$BACKEND" = "xpostgresql"; then
  EMDFLDADD="-lemdf -lpq";
  EMDFDEPS="../EMdF/libemdf.la";
else
  EMDFLDADD="-lemdf -lmysqlclient";
  EMDFDEPS="../EMdF/libemdf.la";
fi

Let's deal with the easy stuff first. In order to use libtool, we need to tell autoconf that it is going to be used, with the macro AC_PROG_LIBTOOL. We don't need AC_PROG_RANLIB used for static libraries, because libtool handles all that. Because we have a choice of database backends, emdros needs to be told which database it will be used with, and the appropriate emdros libraries to be linked in. These dependencies get propagated down to the lower level makefiles, such as the EMdF subdirectory.

Library Versioning

Now a more complex but essential subject is library versioning. Because many libraries form the foundation of many other libraries and programs, e.g., libc, and because these libraries are in constant development and change, a protocol was created to allow multiple versions of the same shared library to exist on the same system, so that application programs (and, indeed, the kernel) can have the version of libraries against which they were compiled. So now the question is, when does one need more than one version of a library? The answer is, when the API changes. If the API does not change, then the application program doesn't care particularly if something "under the hood" changes.

Library versions track the *interface*, which is a set of three entry points into the library. These entry points are arranged in a hierarchy:

current interface:revision number:age number

The current interface documents a specific way the library functions are called. That means if there is any addition to the library functions, or changes in the way those functions are called, the data type of their parameters, etc., then the interface number of the library must change. If any revisions to the source code of the library is made by fixing bugs, improving performance, even adding functionality (e.g., more rigorous tests made of input data), *but* the prototype of the library functions has not changed, then this is a *revision* of the *current* interface, and the middle number is incremented. The runtime loader will always use the highest revision number of the current interface. Finally, the age number tells us how many previous interfaces are supersets of earlier interfaces, i.e., how many earlier interfaces can be linked by binaries. The age must always be less than or equal to the current interface number.

Quoting from AutoBook, here are the rules for incrementing these three numbers:

If you have changed any of the sources for this library, the revision number must be incremented. This is a new revision of the current interface.
If the interface has changed, then current must be incremented, and revision reset to `0'. This is the first revision of a new interface.
If the new interface is a superset of the previous interface (that is, if the previous interface has not been broken by the changes in this new release), then age must be incremented. This release is backwards compatible with the previous release.
If the new interface has removed elements with respect to the previous interface, then you have broken backward compatibility and age must be reset to `0'. This release has a new, but backwards incompatible interface.

Thus, in our example above, since this is the first interface for the shared libraries, it receives the number 0. There are no revisions for this new interface, so revision=0 and age must be 0. Here is a very important principle:

THE SOFTWARE RELEASE VERSION
AND THE SHARED LIBRARY VERSION NUMBERING SCHEMES
HAVE NOTHING TO DO WITH EACH OTHER!

Even though emdros is at Release 1.1.7 when shared libraries support was added, the numbering scheme of the libraries is 0:0:0 and is independent of the release numbers. The library versioning *must* conform to the four rules listed above. If other parts of emdros are changed, but not the libraries, then the library verions remain the same.

Finally, the AC_SUBST macro exports the values of the library versions for substitution in lower-level makefiles.

The Final Step

The code for the libemdf library itself is found in EMdF/Makefile.am:

pkglib_LTLIBRARIES = libemdf.la
libemdf_la_SOURCES = conn.cpp \
       emdf_wstring.cpp \
       emdfdb.cpp \
       utils.cpp \
       inst.cpp \
       monads.cpp \
       infos.cpp \
       table.cpp \
       string_func.cpp \
       inst_object.cpp \
       emdf_output.cpp
libemdf_la_LDFLAGS = -version-info @LIB_CURRENT@:@LIB_REVISION@:@LIB_AGE@

First, we tell automake about our library: we want it installed in the "package" directory for libraries (/usr/local/lib/emdros in this case), that the following list of of libraries are to be made in both static and shared versions, using the "la" extension name. The next macro tells automake what source files are to be used for building libemdf.la. Finally, we tell automake about the version of the library.

That's it, believe it or not! Automake and libtool handle all the rest. Simply invoke the compile. For example:

aclocal && automake --add-missing && autoconf && ./configure && make install

Now do you see why we so strongly recommend the Autobuild tools? Yes, we thought so! :-)