Tuesday, September 14, 2010

Debugging dlopen() UnsatisfiedLinkError on Android

Recently I am involved in compiling custom native libraries for various versions of Android OS and this brings some problems that are hard to debug. Many of them are occurring in runtime during library loading in dlopen() that is called some time after System.loadLibrary("somelibrary") in Java code. Problem is even more complex when library is loading fine on emulator and is not loading on a real device. I had this problem with a library that was loading fine on stock 2.2 emulator and crashing on Nexus One 2.2. Exceptions that are thrown during this operation do not carry too much information about the real reason, sometimes they are very misleading. For example when loading a library and dlopen() fails you can end up with something like this on the stacktrace:


DEBUG/dalvikvm(787): Trying to load lib /data/data/mpigulski.android/files/libsomelibrary.so 0x449c10f8
INFO/dalvikvm(787): Unable to dlopen(/data/data/mpigulski.android/files/libsomelibrary.so):
Cannot load library: link_image[1995]: failed to link libsomelibrary.so
...
(big stacktrace here)
...
ERROR/mpigulski.android.LibraryLoader(787): Caused by: java.lang.UnsatisfiedLinkError: Library somelibrary not found


So it is natural to look up the device and check if it is really not there, but... it is there! What the hell...

Debugging on emulator is easier then debugging on the device because on emulator there are many tools that cannot be found on a real device. Emulator has a directory with bunch of dedicated tools like strace, tcpdump and more in /system/xbin/ and this directory is not present on Nexus One 2.2. I came up with two debugging methods during the search for solution of my problem. One of them involves using strace and second one is using arm-eabi-ld linker that comes along with Android sources.

strace

Using strace for debugging comes from the comment in dvmLoadNativeCode(...) method in the Native.c file. Quote from this file:

/*
* Open the shared library. Because we're using a full path, the system
* doesn't have to search through LD_LIBRARY_PATH. (It may do so to
* resolve this library's dependencies though.)
*
* Failures here are expected when java.library.path has several entries
* and we have to hunt for the lib.
*
* The current version of the dynamic linker prints detailed information
* about dlopen() failures. Some things to check if the message is
* cryptic:
* - make sure the library exists on the device
* - verify that the right path is being opened (the debug log message
* above can help with that)
* - check to see if the library is valid (e.g. not zero bytes long)
* - check config/prelink-linux-arm.map to ensure that the library
* is listed and is not being overrun by the previous entry (if
* loading suddenly stops working on a prelinked library, this is
* a good one to check)
* - write a trivial app that calls sleep() then dlopen(), attach
* to it with "strace -p " while it sleeps, and watch for
* attempts to open nonexistent dependent shared libs
*
* This can execute slowly for a large library on a busy system, so we
* want to switch from RUNNING to VMWAIT while it executes. This allows
* the GC to ignore us.
*/



Above there are some hints about how to debug problems with dlopen(), one that was usefull for me was the one with calling strace on a process that fails upon library loading. Steps to do it:
1) Place Thread.sleeep(10000); before the System.loadLibrary("somelibrary")
2) Start the application and while application is launching use adb shell to log in on the emulator and look up the process PID with ps
3) Hook up to the process with strace -v -s 256 -p <your application PID>.

After the sleep finishes then library will be loaded and console will be filled with lots of information that may be useful in tracking down the problem.

To get this working on a device it is required to push the strace into /data/local/tmp/ directory (it is not mandatory, however I used to store it there and it works) chmod 755 the file and you are ready to go. You can use the binary from the emulator or download one from here. Second binary comes from an article about strace presented on http://benno.id.au/blog/2007/11/18/android-runtime-strace that describes in detail this tools output and was very useful for me.

arm-eabi-ld

This executable and many others can be found in Android source repository. This link is for the linux-x86 version, but others can be found there too.

To use it it is required to have libraries that are required for linking. To check which of them exactly are needed you can use arm-eabi-readelf from the same repository. Sample output for libwebcore.so:


mpigulski@desktop$ ./arm-eabi-readelf -d libwebcore.so

Dynamic section at offset 0x4d5f50 contains 36 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libandroid_runtime.so]
0x00000001 (NEEDED) Shared library: [libnativehelper.so]
0x00000001 (NEEDED) Shared library: [libsqlite.so]
0x00000001 (NEEDED) Shared library: [libskia.so]
0x00000001 (NEEDED) Shared library: [libutils.so]
0x00000001 (NEEDED) Shared library: [libui.so]
0x00000001 (NEEDED) Shared library: [liblog.so]
0x00000001 (NEEDED) Shared library: [libcutils.so]
0x00000001 (NEEDED) Shared library: [libicuuc.so]
0x00000001 (NEEDED) Shared library: [libicudata.so]
0x00000001 (NEEDED) Shared library: [libicui18n.so]
0x00000001 (NEEDED) Shared library: [libmedia.so]
0x00000001 (NEEDED) Shared library: [libsurfaceflinger_client.so]
0x00000001 (NEEDED) Shared library: [libdl.so]
0x00000001 (NEEDED) Shared library: [libstlport.so]
0x00000001 (NEEDED) Shared library: [libc.so]
0x00000001 (NEEDED) Shared library: [libstdc++.so]
0x00000001 (NEEDED) Shared library: [libm.so]
0x0000000e (SONAME) Library soname: [libwebcore.so]
0x00000010 (SYMBOLIC) 0x0
0x00000019 (INIT_ARRAY) 0x480000
0x0000001b (INIT_ARRAYSZ) 52 (bytes)
0x00000004 (HASH) 0xb4
0x00000005 (STRTAB) 0x3ba8
0x00000006 (SYMTAB) 0x1308
0x0000000a (STRSZ) 15440 (bytes)
0x0000000b (SYMENT) 16 (bytes)
0x00000003 (PLTGOT) 0x4d6090
0x00000002 (PLTRELSZ) 4784 (bytes)
0x00000014 (PLTREL) REL
0x00000017 (JMPREL) 0x9d430
0x00000011 (REL) 0x77f8
0x00000012 (RELSZ) 613432 (bytes)
0x00000013 (RELENT) 8 (bytes)
0x6ffffffa (RELCOUNT) 75878
0x00000000 (NULL) 0x0


Output from this tool however displays only the direct dependencies to the chosen library and there may be (and most certainly will be) transitive dependencies. To fulfill them it is required to have a directory with those libraries in it. Easiest way to do it is to pull everything from the target emulator/target device with adb pull /system/lib/ /some/local/directory/.

To show how it works I have pulled /system/lib/ from Nexus One 2.2 and 2.2 emulator. Below are the results of checking stock webkit library that was crashing on Nexus One 2.2 and was running fine on emulator.


mpigulski@desktop$ ./arm-eabi-ld libwebcore.so -rpath=./emulator-2.2-system-lib/
./arm-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000827c
mpigulski@desktop$ ./arm-eabi-ld libwebcore.so -rpath=./nexus-2.2-system-lib/
./arm-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000829c
libwebcore.so: undefined reference to `__aeabi_f2uiz'


From this output it is clear that on Nexus One 2.2 there are some different libs that lack one symbol required by the stock library. Next thing was finding this symbol and static linking the library that had those symbols and this solved the problem.

To check other options for the arm-eabi-ld use arm-eabi-ld --help