Edit Rename Changes History Upload Download Back to Top

VisualWorks THAPI

VisualWorks Threaded Interconnect

Eliot Miranda, Cincom Systems, Inc., March, 2002 (originally published January, 1997)

Introduction

This document describes the VisualWorks Threaded API (THAPI) mechanism, a multithreaded extension of the standard VisualWorks DLL and C Connect (DLLCC) facilities. This mechanism allows a VisualWorks application to make multiple concurrent, possibly blocking, calls to external code, each on its own independent thread of control, and it allows VisualWorks to handle multiple concurrent callbacks from external code running on any thread in the VisualWorks process.

The document provides an overview of the design and its necessity, a comprehensive example, a discussion of limitations and performance, and a reference section.

Contents

  • Terminology
  • Design Overview
  • The Blocking I/O Problem
  • Asynchronous I/O
  • Multithreaded I/O
  • Argument Handling Using Multiple I/O Threads
  • Thread Management and Correspondence to Processes
  • THAPI Example
  • Threaded External Methods
  • Additional Control Over Threads
  • Thread Limit and Low Tide
  • Attaching Processes to Threads
  • Objects and Fixed Space
  • Callbacks
  • Limitations
  • Use of Oops and Message Sends
  • FixedSpace Capacity
  • Thread Priority
  • The Debugger
  • Process Termination
  • Maximum Number of Threads
  • Performance Considerations
  • THAPI Reference
  • Specifying Threaded Calls and Callbacks
  • Controlling Thread Creation
  • Thread Reservation
  • Fixed Space
  • Foreign Callback Process Priority
  • Example Sources

  • Terminology

  • Smalltalk process (STP)
    a Smalltalk object representing a single thread of control within the system.

  • semaphore
    a Smalltalk object implementing a counting, queuing semaphore, used to synchronise Smalltalk processes.

  • process scheduler
    a Smalltalk object that performs the scheduling of Smalltalk processes.

  • heavyweight process
    an operating system object, the combination of a set of one or more threads that control a single space in memory that is the operating system's natural unit of independent activity.

  • thread
    an operating system object representing an independent thread of control in a heavyweight process. A thread is the operating system's natural unit of scheduling. Note well: these differ fundamentally from library threads and Smalltalk processes, both of which are scheduled by some virtual machine running on a single thread above, and independent of, the operating system's own thread scheduling.

  • condition variable
    an object implementing a counting, queuing semaphore, used to synchronise threads.

  • mutex
    a variable used to provide mutually exclusive access to some shared resource by a set of threads, typically used via spin-waiting.

  • signal
    any one of the following:
  • the signalling of a Smalltalk semaphore, which causes the scheduling of a Smalltalk process waiting on the semaphore.
  • the signalling of a condition variable, which causes the scheduling of a thread waiting on the condition variable
  • an asynchronous software interrupt of a heavyweight process or thread
  • thread-safe
    the property of software components that they may be used safely by more than one thread running concurrently. Thread-safety typically is achieved using synchronisation mechanisms such as mutexes to serialize access by multiple threads.

  • foreign code
    for the purposes of this document foreign code refers to any code written in a language other than Smalltalk which is included in and used by a Smalltalk application.
  • Design Overview

    This section provides the rationale for the VisualWorks threaded interconnect. It first explains the problems and limitations imposed by the VisualWorks ObjectEngine prior to the threaded interconnect, and the strategies used to work around these limitations. Finally, it considers the design space for the threaded interconnect and explains why the current architecture was chosen.

    The Blocking I/O Problem

    All Smalltalk-80 virtual machines in the lineage from the Xerox D-machine implementations through to HPS (the VisualWorks Object Engine) have supported the Smalltalk process model, which provides lightweight processes within a single Smalltalk object space. In all of these implementations the scheduling of Smalltalk processes (STPs) is done by the virtual machine itself, either in response to external events (e.g. from timers and I/O devices) or internal manipulation of Smalltalk semaphores and processes. The ParcPlace HPS virtual machine implements this model upon conventional operating-system platforms that provide their own multiprocessing model and I/O architecture. In all of these contexts, I/O is modulated by the operating system (OS); the only safe access to I/O devices being through system calls into the OS. In all of these contexts, HPS runs as a single heavyweight process with one thread of control. HPS schedules multiple STPs internally, multiplexing (sharing) the single thread of control amongst the various STPs, as appropriate. This multiprocessing is invisible to the outside world; from the host OS's point-of-view HPS appears to be a single thread of control.

    When HPS performs I/O on behalf of Smalltalk code, it must actually make an I/O system call. At this point, control passes to the host OS, which performs the requested I/O operation. Since the operation is likely to require external I/O devices to respond, which can take considerable time, the host OS often cannot respond to the I/O request immediately. Since the process invoking the I/O operation is waiting for the result of the I/O operation, it cannot continue; thus, the host OS blocks the process, at least until the I/O operation completes, and schedules other runnable processes in the meantime. Consequently, HPS often blocks when performing I/O operations. Even though there might be other runnable STPs, they cannot run, since HPS itself is blocked by the OS, and is therefore unavailable to schedule and run those STPs.

    This state of affairs is extremely deleterious to the performance of Smalltalk applications that have heavy I/O operations and concurrently handle multiple threads of control. This includes middle-tier application servers in a three-tier architecture. In this context, multiple clients send requests to an application server that cause it to make multiple requests on a database. The application server can achieve higher throughput if it can handle multiple concurrent requests for data from clients. However, each time the application server makes a request to the database, the entire application server is blocked, awaiting the completion of the database I/O request. No incoming requests can be responded to, and no pending data can be delivered back to the client(s) until the I/O operation completes.

    Asynchronous I/O

    HPS uses an asynchronous I/O architecture for various I/O connections, as is the case when HPS issues a request for service and is later asynchronously notified, via a software interrupt, that the service is available. Socket and terminal I/O is done this way. This architecture can also be used to build a non-blocking interconnect. One or more separate heavyweight processes, possibly multithreaded, and implemented in a suitable system language, e.g. C, implement a blocking I/O call server. Smalltalk communicates asynchronously with the call server, e.g. via sockets, and the multithreaded call server makes blocking calls on Smalltalk's behalf, notifying Smalltalk when each call completes.

    This approach does work, and is used for example in the VisualWave Server 1.0 product. However, there are several problems with this approach:

  • The use of multiple heavyweight processes results in poor performance and excessive resource usage. Each heavyweight process has its own address space with a private copy of all non-shared resources. Context switches between heavyweight processes are much slower than context switches amongst threads in the same process, because the context switch involves a change of address space as well as a change of thread of control.
  • Because communication between the two cannot share objects, copying semantics must be used, resulting in poor performance. Data must be copied several times: from Smalltalk data to network device driver, from device driver to network, from network to device driver, from device driver to call-server data-structure, before being passed to the call. Returning results involves the same operations.
  • Typical software interrupt schemes specify only that a service might be available on some unspecified channel. All open channels must then be polled (if possible) to discover which channel has data available.
  • There are more modes of failure, and there is more difficulty managing resources, since the call server is remote.
  • This approach is hardly transparent (copying semantics are used, there is a connection involved) and hence cannot be hidden easily beneath the existing DLLCC API.

  • Multithreaded I/O

    A number of architectures can be used to ameliorate the blocking I/O problem, from one involving multiple invocations of HPS, communicating amongst each other to process multiple concurrent requests, to the use of host threads to process I/O requests. Most modern operating systems provide for independent threads of control within the address space of a single heavyweight OS process. If one thread makes an I/O request, the host OS blocks that thread until the I/O request is completed, but the OS is free to schedule other threads within the process whilst the I/O operation is pending. All approaches that do not use multiple threads suffer from the same problem, HPS can perform only one blocking I/O operation at a time, during which no other activities can occur. Only solutions using multiple threads can possibly handle multiple blocking I/O requests whilst allowing other activity concurrent with I/O.

    Threads managed by the operating system are known as kernel or OS threads. Many thread implementations are available on operating systems which lack kernel threads. These implementations are called library threads since they are typically implemented as a library of routines. In these systems the appearance of multiple threads is simulated by multiplexing a single OS thread amongst a number of library threads. In fact HPS does just this with Smalltalk processes. However, all library thread systems suffer from the same fatal flaw. Once control passes to the kernel no further scheduling can occur until control returns, blocking other threads for the duration of the call. To avoid blocking one must use an asynchronous I/O architecture as discussed in the previous section.

    Kernel threads are insufficient to achieve non-blocking I/O. The operating system must allow threads to run in parallel with I/O operations. The circumstances under which the OS is able to schedule multiple threads while I/O operations are in progress are those in which the underlying machine either has multiple CPUs, only one of which need handle I/O requests, or separate I/O processors that are able to handle I/O requests handed off to them by the host OS in parallel with the CPU(s) normal processing, or those in which the I/O device is sufficiently slow that the host OS can issue requests to the I/O device and interrogate the device at a later time by scheduling an interrupt timer. Some machines provide multiple threads but lack the necessary hardware features that enable the scheduling of other threads whilst I/O is in progress. On such platforms the architecture limits this objective; the threaded interconnect depends both on the existence of multiple threads and an appropriately architected host OS and hardware combination.

    Two mappings of the current HPS system to a system using multiple threads have been considered; firstly, the mapping of each runnable STP to its own thread, and secondly, the use of a single thread to schedule Smalltalk processes, with multiple threads to perform I/O operations, as required. The former approach is fraught with complications, the most important among these being:

  • All access to shared resources must be carefully managed to avoid contention. This includes the heap. Extensive modifications to the system would be necessary to ensure correct sharing of the heap between multiple Smalltalk threads and the garbage collector. The architecture of the current garbage collector depends crucially upon the fact that the garbage collector is able to preempt the system whilst it scavenges. This is arranged by having the system call the scavenger at appropriate times. Such a simple and efficient preemption scheme would not suffice in a multithreaded context.
  • The standard VisualWorks class library makes assumptions about the underlying Smalltalk scheduling model; i.e., the class library is not thread safe. Areas such as the windowing system would require redesign to function safely in a native threaded context.
  • The existing Smalltalk process model is used extensively in the current system. It is easy to see that mapping this model down to host threads cannot be done without changes. Any changes to the process model would necessitate concomitant changes to the rest of the system.
  • A simpler approach is to use multiple threads only to make I/O calls. As in the current system, a single thread is used to run all Smalltalk processes. Whenever a potentially blocking I/O call is to be made, a separate thread is handed the information necessary to make the call, executes the call, and is blocked by the OS until the call completes and the OS reschedules the thread. Meanwhile, the ObjectEngine thread blocks the Smalltalk process that invoked the operation, scheduling other runnable STPs. Once the I/O thread is scheduled, it passes back the result(s) of the call to the ObjectEngine thread. The ObjectEngine thread is then free to resume the STP that invoked the call. Once the I/O thread has passed its results back, it can either terminate, or remain in a quiescent state, awaiting subsequent requests for I/O operations.

    This scheme creates the illusion of a truly multithreaded Smalltalk. Note that the true multithreaded architecture provides only the ability to run more than one STP concurrently on true multiprocessors; i.e., machines with multiple CPUs. On machines with a single CPU, only one thread can run at a time; therefore, even though Smalltalk might have multiple runnable processes mapped to multiple runnable threads, only one of these can run at a time. Whilst the use of machines with multiple CPUs is likely to increase, especially to exploit the parallelism of these machines, currently few customers are doing so. On single processor machines, HPS implements the current Smalltalk process model with excellent performance (process switch time of about 25 microseconds on a 60 MHz Pentium). Viewed as a black box, on a single processor the architecture presented herein provides the illusion of a true multithreaded implementation, except that:

  • Callouts are slower, since threaded callouts involve at least two thread switches.
  • Interrupt latency is high (HPS polls for interrupts infrequently).
  • Programmers must consciously use the threaded callout mechanism.
  • Using multiple threads only for implementing I/O operations is much simpler to implement than a truly host-threaded implementation. On single processor machines, this approach is equivalent to a true multithreaded implementation. Hence, this architecture has been chosen as the basis for the threaded interconnect, called THAPI for short.

    Argument Handling Using Multiple I/O Threads

    HPS is based on a copying garbage collector. The garbage collector routinely moves objects within the heap at arbitrary times. This is not a problem for normal synchronous calls, because for the duration of a call the garbage collector is blocked along the rest of HPS, and hence no relocation can occur. However, moving objects within the heap is a problem if the call invokes code that tries to remember the address of an object for subsequent calls, or if the code invokes a callback. In either case, when HPS runs, the garbage collector might move the object, invalidating the information retained by the foreign code. In the current system, it is the programmer's responsibility to cope with foreign code that depends on being passed fixed pointers by using DLLCC's malloc facilities, which allow Smalltalk programmers to manipulate data on the C heap (obtained via malloc) and reference this data using instances of CPointer.

    Since HPS is not blocked during a threaded call, the garbage collector is free to move objects whilst the call is in progress. If the garbage collector moves an object being used as an argument to a threaded call, disaster often follows, either by passing invalid data to the foreign code, or by the foreign code writing to invalid addresses and corrupting the Smalltalk heap. For this reason, all reference parameters in threaded calls must refer to data that does not move for the duration of the call. There are three ways to achieve this. Firstly, you can use the existing DLLCC malloc facilities. Secondly, you can arrange it so that the object's data is copied to some fixed space at the start of the call, and copied back once the call completes. Lastly, the object's data can be relocated to a fixed space, where it resides for at least the duration of the call. The second alternative does not preserve referential integrity; that is, if an object is passed to more than one concurrent threaded call, or if Smalltalk code accesses the object whilst the call is in progress, the participants see different copies of the data. This is a serious problem in applications that want to share data, e.g. applications that share pools of I/O buffers between I/O drivers and applications, as supported by some file systems such as Windows NT. Further, the overhead of copying data to and from some fixed space might be unacceptably high.

    Consequently, the Smalltalk heap now has an additional space called FixedSpace. The bodies (contents) of objects in FixedSpace do not move, preserving referential integrity and allowing them to be used freely as the arguments in threaded (and normal) calls. Promotion of objects to FixedSpace is automatic, and occurs when a mobile argument is passed as an argument to a threaded call. Further, objects can be created in FixedSpace using new object-creation primitives.

    Thread Management and the Correspondence Between Smalltalk Processes and Threads

    The cost of thread creation can be high, so the threaded interconnect should manage a pool of threads, rather than creating a thread for each threaded call. (For example, making a trivial threaded call on a 60MHz Pentium running Windows NT 4.0 takes about 300 microseconds. If a thread must be created in addition, the call takes about 3500 microseconds, an overhead of around 1100 percent.) Since some applications require that an API be used by a specific thread (e.g. a debugging process observing some other process under Windows NT), the interconnect should also allow programmers to ensure that a specific thread is used to make a call. The THAPI meets these requirements by maintaining a pool of active threads, and, as much as possible, by using the same I/O thread to perform calls on behalf of a specific Smalltalk process. Callbacks from I/O threads are run on the stack of the Smalltalk process that issued the callout. Callbacks from foreign threads (threads other than ones created by the threaded interconnect) are handled on new Smalltalk processes that remain associated with the foreign thread until it returns from the callback. Facilities are provided to reserve an I/O thread for the sole use of a specific Smalltalk process, and to control the size of the pool.

    THAPI Example

    From the DLLCC programmer's perspective, THAPI is a relatively small extension of the existing DLLCC facilities. A good way to explain these new facilities is through a simple example that illustrates all features except callbacks. In the example, a multithreaded "server" is constructed with a number of Smalltalk processes waiting for "requests" on an I/O connection. For simplicity, the requests within a single image are generated and served.

    In this example, a "request" is simply a character string, and a response to a request is to display the string in the System Transcript. Requests are written to a single pipe. (A pipe is an I/O channel represented by a pair of file descriptors. A write of data via the write file descriptor makes that data available for reads via the read file descriptor.) Each "server" process is a loop that blocks, waiting for data to appear on the pipe before writing this data to the transcript. Another process makes "requests" by writing data to the pipe. To give the Smalltalk system something computationally intensive to do whilst all this is going on, the data written to the pipe is the results of running a simple benchmark.

    This example, even though very simple, is sufficient to require all THAPI facilities other than callbacks. If you were to run the example without THAPI, Smalltalk would freeze immediately. As soon as a Smalltalk process attempts to wait for a request by performing a blocking read on the pipe, the ObjectEngine blocks in the read call. Hence, the Smalltalk process that generates requests can no longer run. With no data written to the pipe, the server processes never return from the call, resulting in deadlock.

    Our example class is NonBlockingPipeInterface, since it waits for data on a pipe without blocking the ObjectEngine. It is a subclass of ExternalInterface, since it has a number of external methods. Amongst its instance variables is a pair of file descriptors, infd and outfd, for the pipe, and a boolean flag running that determines when each "server" process terminates.

    Each "server" loops in the reader: method, reading data and then writing the data to the Transcript.

    NonBlockingPipeInterface methods for server
    reader: id
        "Read from the pipe as long as running is true.  Print what ever is
         read from the pipe to the Transcript and tag it with id."
        | buffer count |
        buffer := CIntegerType char gcMalloc: 1025.                    "Use a buffer on the C heap for the read call."
        [running] whileTrue:                                           "Continue until running is false"
            [count := self read: infd with: buffer with: 1024.         "Make a blocking read from pipe on its own thread."
            TranscriptProtect critical:                                "Use a mutex to serialize writing to Transcript"
                [Transcript cr; print: id; tab.                        "Print this reader's id tag."
                (count > 1024 or: [count < 0])                         "Check read result and complain if its in error."
                    ifTrue: [Transcript nextPutAll: 'READ RETURNED '; print: count]
                    ifFalse:
                        [buffer at: count put: 0.                      "Null-terminate then copy data as a String."
                        Transcript nextPutAll: buffer copyCStringFromHeap].
                Transcript endEntry]]
    
    

    The read buffer is allocated on the C heap. The buffer is 1025 bytes long, large enough for 1024 characters and a null-terminating byte. Smalltalk objects can be moved by the garbage collector, which can and might run whilst a threaded call is in progress. Consequently, normal Smalltalk objects cannot be used as container arguments to _threaded calls, since the garbage collector might move them independently of the threaded call. Later, this example is refined to include the use of fixed-space objects.

    The Transcript is not thread safe, so access to multiple reader processes attempting to write to the Transcript at the same time must be serialized. A class variable, TranscriptProtect, is used to achieve this.

    NonBlockingPipeInterface class methods for class initialization
    initialize
        "Initialize the mutex for serializing writes to the Transcript and a constant to open the pipe in binary mode."
        TranscriptProtect := Semaphore forMutualExclusion.
        O_BINARY := 16r8000                "from msdev\include\fcntl.h"
    
        "self initialize"
    
    

    Threaded External Methods

    As for the threaded call itself, the send message that invokes the call is indistinguishable from a normal DLLCC call. The threaded-ness is a property of the external method specifying the call. Threaded calls are specified by using the _threaded pseudo-qualifier in the C pragma of an ExternalMethod. Note that the _threaded keyword must follow the function's return-type. The type of the buffer argument is _oopref

    An invocation of the method causes the calling Smalltalk process to block until the read call returns. Meanwhile, other runnable Smalltalk processes can execute. To perform the call the ObjectEngine provides a thread that is available to make the call, passes all the information necessary to make the call (the function and arguments) to the thread, and blocks the calling Smalltalk process until the thread returns a result. Once the result is returned, the ObjectEngine passes the result back to the process and allows it to continue. Importantly, the thread making the call is given the next higher priority to the ObjectEngine thread to ensure that it makes progress, even if the Smalltalk system has other runnable processes for the ObjectEngine to execute.

    Since pipes have limited capacity, it is possible to block when writing to a pipe. A write to a pipe might block until sufficient reads have been done to make space available for the write. Thus, to ensure that the example's computation is not interrupted by potentially blocking writes, a separate process is used to perform the writes via threaded calls. The write process reads results from the generator through an instance variable, results, which is a SharedQueue. A SharedQueue is a thread-safe way of communicating between processes, somewhat analogous to a pipe. An object added to the queue via nextPut: is available via next. If the queue is empty, the calling process blocks in the next method until an object is added to the queue via nextPut:.

    The writer is rather similar to a reader:

    NonBlockingPipeInterface methods for server
    writer
        "Loop writing strings from the results queue to the pipe."
        | result buffer writeCount |
        buffer := CIntegerType char gcMalloc: 1024.             "Use a buffer on the C heap for the read call."
    
        [true] whileTrue:
            ["Get the next result from the results shared queue. This process waits until one is available.
              Convert the result to a ByteArray since the type of the buffer is char (an integer)."
            result := results next asByteArray. 
    
            buffer copyAt: 0 from: result size: results size  startingAt: 1.    "Copy the string into the buffer."
            writeCount := self write: outfd with: buffer with: results size.    "Write the buffer's data to the pipe."
    
            writeCount ~= results size ifTrue:                  "Check the write operation succeeded."
                [TranscriptProtect critical:
                    [Transcript
                        cr;
                        nextPutAll: 'WRITE RETURNED '; print: writeCount;
                        nextPutAll: ' EXPECTED '; print: results size;
                        nextPut: $!; endEntry]]]
    
    

    The writer does not test running, since it is explicitly terminated. The write is also a threaded call:

    NonBlockingPipeInterface methods for procedures
    write: fd with: buffer with: size
        "Invoke the write system call on its own thread and hence avoid blocking the ObjectEngine."
        <C: long _threaded write(int fd, _oopref *buffer, unsigned long size)>
        ^self externalAccessFailed
    
    

    To implement the rest of the interface, you first need some interface functions to open and close the pipe, and you need a benchmark to run.

    NonBlockingPipeInterface methods for procedures
    close: fd
        "Close the file descriptor fd"
        <C: int close(int fd)>
        ^self externalAccessFailed
    
    pipe: arg
        "UNIX pipe creation function."
        <C: int pipe(int [])>
        ^self externalAccessFailed
    
    pipe: arg ofSize: size mode: textMode
        "NT pipe creation function."
        <C: int pipe(int arg[], unsigned int size, int textMode)>
        ^self externalAccessFailed
    
    
    Integer methods for mathematical functions
    nfib
        "The nfib benchmark calculates a rough measure of activations per second. This is a version of fibonacci in
         which 1 is added for each activation. The result is therefore equal to the number of activations required to
         calculate that result.  To get the 'nfib' figure of nfib activations per second choose a value which takes nfib
         about 30 seconds to calculate. Then divide the result by the time taken, yielding activations per second."
        self < 2 ifTrue: [^1] ifFalse: [^(self - 1) nfib + (self - 2) nfib + 1]
    
    

    Another method is used to open the pipe, since it must be opened differently under UNIX and Windows. Also, a separate method is used to terminate the example, since it might be run in the background and need to be terminated from some other process. (The terminate method is careful to do nothing if already terminated. On process termination, any unwind blocks in the process are run. Hence, if terminate is sent from some other process, it gets sent again from the unwind block in the readers: method when terminate kills the generator process. An instance variable generator is used to refer to the process running the benchmark, and the instance variable readers is used to refer to the collection of readers.

    NonBlockingPipeInterface methods for initialize-release
    openPipes
        "Create the pipe.  Note that pipe in the MSVC run-time library is different from the standard UNIX pipe."
        | fds |
        fds := CIntegerType int gcMalloc: 2.
        (OSHandle currentOS == #win32
                ifTrue: [self pipe: fds ofSize: 1024 mode: O_BINARY]
                ifFalse: [self pipe: fds]) < 0
            ifTrue: [self error: 'pipe open failed.'].
        infd := fds at: 0.
        outfd := fds at: 1
    
    terminate
        "Terminate all the relevant processes and close the pipe."
        running ifFalse: [^self].                   "Don't do anything if already terminated."
        running := false. 
    
        generator == Processor activeProcess ifFalse: [generator terminate].
    
        "Write sufficient data to the pipe so that all readers get data, and hence by checking running, stop."
        readers size * 2 timesRepeat: [results nextPut: 'so long!'].
    
        "Delay until the results have been written by the writer and then kill the writer.
         Yield doesn't work if the process doing terminate has a higher priority than the writer so use a delay."
        [results isEmpty] whileFalse: [(Delay forMilliseconds: 20) wait].
    
        writer terminate.
        self close: infd; close: outfd.             "Close the pipe" 
    
    

    The "main loop" opens the pipe, creates a shared queue to communicate results to the writer, spawns the readers and writer, and then loops, generating data. On unwind, it calls terminate to shut down.

    NonBlockingPipeInterface methods for public access
    readers: n
        "Run the example with n reader processes."
    
        results := SharedQueue new.
        self openPipes.
        generator := Processor activeProcess.   "remember the generator process for terminate.""
        running := true.
    
        "Fork a writer process to write data to the pipe.  Its priority is higher than the generator
         to ensure writes happen promptly."
        writer := [self writer] forkAt: generator priority + 1.
    
        "Fork n readers at a higher priority so that results get read and displayed."
        readers := (1 to: n) collect: [:i| [Processor yield. self reader: i] forkAt: generator priority + 1].
    
        "Generate some data.  Use a benchmark the author is excessively fond of.  But anything would do."
        [| i r t s nfibs |
            s := String new writeStream.
            i := 0.
            [t := Time millisecondsToRun: [r := i nfib].
            s reset.
            nfibs := Number errorSignal handle: [:ex| '??'] do: [(r * 1000.0 / t) rounded].
            s       nextPutAll: 'nfib '; print: i; nextPutAll: ' = '; print: r;
                tab; tab;
                nextPutAll: 'nfibs '; print: nfibs;
                nextPutAll: ' ('; print: t / 1000.0; nextPutAll: ' seconds)'.
    
            results nextPut: s contents.        "Put datum in results shared queue for the writer to consume."
    
            "Increase the value from which we compute nfib, limiting it to one that takes 30 seconds or less to run."
            i := t > 30000 ifTrue: [0] ifFalse: [i + 1]] repeat]
        valueNowOrOnUnwindDo: [self terminate]
    
    

    The underlying C functions for accessing the pipe are in the C library. On Windows the C library is one of the MSVCRTNN.DLL DLLs. On Solaris the C library is /usr/lib/libc.so, and on Digital UNIX it is /usr/shlib/libc.so. You can use a single interface class for all these cases, provided the interface copes with the libraryNotFoundSignal, which is raised when an attempt is made to open a nonexistent library. For example, the libraryNotFoundSignal signal is raised if the interface tries to open /usr/shlib/libc.so on a Windows machine.

    ExternalInterface supports a standard idiom for doing just this. The class declaration should include the full set of library files and directories for all systems, and on the class side of the interface you implement the libraryFilesSearchSignals method to return the Signal or SignalCollection of the signals to be ignored during library loading. Thus, you need the following method to avoid raising a signal when using the interface's procedures:

    NonBlockingPipeInterface class methods for private
    libraryFilesSearchSignals
        "Answer a SignalCollection used to handle exceptions raised when scanning for library files. The signals
         answered by this method results in those signals being ignored by the library search  machinery. Clients
        should not answer signals they wish to receive."
    
        ^ExternalLibraryHolder libraryNotFoundSignal
    
    

    On Windows you also need to know where to look for MSVCRTNN.DLL. One enhancement in VisualWorks 2.5.2 is the ability to use environment variables in the list of library files and directories. (You can also use 'patterns' to match against OSHandle currentPlatformID. Browse ExternalLibrary>> findFile: inDirectories: for a full description.) In the following, $windir expands to the value of the windir environment variable; for example, C:WIN95. Thus, the following class declaration loads the appropriate C library on Windows 95, Windows NT 3.51 & 4.0, and Solaris and Digital UNIX.

    ExternalInterface subclass: #NonBlockingPipeInterface
        includeFiles: ''
        includeDirectories: ''
        libraryFiles: 'libc.so msvcrt40.dll msvcrt20.dll '
        libraryDirectories: '/usr/shlib /usr/lib $windir\system $windir\system32 '
        generateMethods: ''
        beVirtual: false
        optimizationLevel: #full
        instanceVariableNames: 'infd outfd generator writer readers results running '
        classVariableNames: 'O_BINARY TranscriptProtect '
        poolDictionaries: 'NonBlockingPipeInterfaceDictionary '
        category: 'ExternalInterface- THAPI Example
    
    

    To run the example, evaluate, e.g. NonBlockingPipeInterface new readers: 10. Here is a screenshot of the transcript output when running this example on a 60MHz Pentium running Windows 95.

    Additional Control Over Threads

    Thread Limit and Low Tide

    The example is extended next to add some control over the number of threads that can be created. THAPI provides a hard limit on the maximum number of threads that can be created and a low-tide limit on the number of quiescent threads. When the ObjectEngine starts up, it initializes both the upper and lower thread limits to 32.

    A thread is created when a threaded call is made by a process with no associated thread and no unassociated threads exist in the pool. For the duration of the call, the thread is then associated with the calling process and is used in any nested threaded calls. For example, if a callback occurs during the call, the callback runs in the same process that made the callout. Any threaded callouts made from this process whilst the outermost threaded call is still in progress are made by the same thread.

    Once the outermost threaded call returns, the thread is disassociated with the calling process and is returned to the pool. The thread can then be used to perform a call on behalf of any process. Thus, a burst of concurrent threaded calls can result in the creation of a number of threads which, when the calls return, end up unassociated in the pool. The live thread low tide is used to control the size of the pool. If the total number of threads maintained by Obje