Last talk of the day (not counting the STIC event later), and it's Michael Lucas-Smith talking about Smalltalk and C interfacing. First question: Why do we bother with C? C is the "standard" level of access to operating systems, databases, libraries, etc. It's what you use if you want to reuse.
There are two main ways to interface: wrap an interface into the VM via primitives (relink/recompile the VM), or use a foreign interface (like DLLCC). There's a third way: Recompile the foreign library such that it can talk to you. This is the best way to get C++ apps (that don't have a C API) to have an interface. Python does this heavily, and has some partial automation in that direction.
- C Header Parser (has limitations, more or less dated to the K&R spec)
- You end up doing the interface "by hand"
- VW does have good error handling with calls out to C, and can handle threading C calls.
- C functions defined in a large pragma definition
- Structs are subclasses of OSObject or OSPtr
Squeak, Dolphin, and VW are similar in terms of how they talk to C - Squeak does have the interesting VMMaker thing.
- Limit description of the interface
- Modify the VM by including the definition and recompiling
- In your Smalltalk code you basically add a primitive
- Dynamically compiles C code inline in a Smalltalk method - you can write C code in your methods
So - how do other languages do this? Python uses BOOST - allows you to make calls at the Python level very easily. Perl is mostly similar - open the DLL and then call the C API you want to hit. Ruby is the second worst (Only Java is worse). You recompile the C library with additional macros, and then make calls. The Ruby folks don't seem to mind though - they do lots of C interfacing.
C# - no structure information from the header file at all. You import the DLL and specify the entry points (basically, a separate manifest file from the header file). Somewhat similar to the way VW works (at a high level).
Java - the hardest to integrate.
- Modify the DLL
- Write an interface library
- Java will fix up the arguments based on this "middle man"
Lisp - somewhat similar to the way Perl and Python does it. (Aside - varies by implementation?) Scheme is the same, but with different names.
Forth - they have a header file parser, like VW, but it works better. However, it looks like moth Forth programs statically link libraries rather than use dynamic linking. There are also a lot of Forth implementations.
What about COM, .NET (etc)? The VW COM interface subclasses of COMInterface (usually IUnknown) which does a DNU to do dynamic lookups. The calls are easy to make. The DotNetConnect - you subclass off DotNETObject, actual calls are done using the C interface (as with DLLCC). Some Lessons:
Don't Reinvent the Wheel - we interface to C because the libraries already exist. For instance: LibXSLT. VW has an XSL library, but is not standard compliant (created pre-standard) - and thus does not work with XSL "in the wild". Much of the open source code out there is cross platform, so you don't lose anything if that's true. Even if it's single platform, that may not matter for your project.
Sometimes, you need to Reinvent the Wheel - LibTidy only worked on win32 and Linux, and was unstable. The Mac and Unix versions were only command line, not DLL. The end result: time might have been better spent writing a Smalltalk version.
Don't use C interfaces - if you can use a command line stdin/stdout interface, you may be better off just doing that.
Don't use C - Sometimes you're better off just writing the application in Smalltalk. Used to use LibASpell for spell checking in the XML Editor (but it was slow). Redoing it in Smalltalk with a better algorithm was simpler and better.
Use C - When you come across a library that's worth it (Michael's example: BerkeleyDB). In the process of doing this, a few bugs cropped up - issues with fragmentation in FixedSpace, for example.
Most slowness in C interfacing (VW) is from garbage created as a side effect. Michael's suggestion? Allocate on the stack for C Calls, as C does. Another lesson: don't use ephemerons for C objects, as this just spikes the GC harder.
What ends up happening? People implement an interface to the little bit of a library that they need access to, and no more (because dealing with the header files is so hard). Many of the libs Michael has built are pieces of the Windows API. So why is this hard? C Parser grammar is kind of insane. C is a partially context bound grammar. The C pre-processor is powerful enough to play towers of hanoi! C++ is completely context bound grammar. C++ templates are virtually impossible to deal with, and C and C++ ABIs have deficiencies:
- size and alignment of data types
- layout of structured types
- calling conventions (cdecl, pascal, etc)
- register usage conventions
- interfaces for runtime arithmetic support
- object file formats
C++ has these, and adds more (name mangling!). Also exception handling, invoking constructors/destructors, layout, alignment and padding of classes, and layout and alignment of virtual tables. Interestingly, intel has fixed this in x86 64 :)
SWIG (Simplified Wrapper and Interface Generator). Has its own syntax for describing interfaces. Generates a compiled version for an existing language - need to create own for each language. Python BOOST is similar, done better for Python only (now). Works for C++, allows calling back and forth from Python and C++. GCC-XML outputs an XML manifest prior to compilation - complete reflection for C/C++. No need to parse header files with this. Pyste is a combination of BOOST and GCC-XML to build Python/C++ interfaces. There's also C++Filt - claims to be able to de-mangle names. If it works, you could automate interfaces to C++.
- Smalltalk modeling
- Stack allocation
- ANSI C compliant parsing
- All common and OS libs pre-parsed and delivered withVW
- C++ interfacing
- C++ reflection
- Automatic generation of classes based off C structures and functions that use those structures