In response to the very end of this article, where Rodney Bates said this:
Smalltalk pays a high price elsewhere for taking object orientation to the extreme, notably in complete loss of static typing and serious runtime efficiency penalties. Special, one-instance forms of classes are, for many programming problems, not as good a conceptual match as modules. But at least it provides a single, consistent, and syntactically explicit call mechanism.
I thought I'd ask our lead VM engineer - Eliot Miranda - for some details on method lookup in Smalltalk (VisualWorks in particular):
Rodney, you should read the following books & papers (in order); they'll help you understand Smalltalk's performance.
[Goldberg83] Adele Goldberg, David Robson, Smalltalk-80: The Language and its Implementation, Addison-Wesley, 1983, ISBN 0.201.11371.6.
Now out of print but available by combining
- Adele Goldberg, David Robson, Smalltalk-80: The Language, Addison-Wesley, 1989, 0.201.13688.0
- The Blue Book (Implementation)
- [Deutsch84] L. Peter Deutsch, Allan M. Schiffman, "Efficient Implementation of the Smalltalk-80 System", 11th Annual Symposium on Principles of Programming Languages, pp. 297-302, January 1984, ACM.
Context Management in VisualWorks 5i
Eliot Miranda - Available on the web here(PDF)
But briefly, here's how things are faster than you expect. Message selectors are maintained by the system as a pool of unique strings (Symbols), so that equality comparison of message selectors requires only comparing the addresses of the symbol objects. In the 70's and early 80's message lookup was optimized by the run-time system maintaining a small (1024 or 2048 entry) method lookup cache that remembers recent method lookups. The table is hashed by the identity hash of the receiver's class and the message selector. On early systems the id hash is equivalent to the object's address. Method lookup then becomes:hash := receiver class hash + selector hash bitAnd: CacheSize. (cache at: hash) class == receiver class and: [(cache at: hash) selector selector]) ifTrue: [targetMethod := (cache at: hash) method] ifFalse: [targetMethod := self lookup: selector in: receiver class. (cache at: hash) class: receiver class; selector: selector; method: targetMethod.
Since then Dynamic Translation (a.k.a. JIT compilation) has ncreased performance by nearly an order of magnitude. The run-time system does not interpret bytecode; instead it maintains a cache of the most recent used methods compiled on-demand to machine code. We call these nmethods. Every message send is first translated into the following machine code sequence:classRegister := selector. "i.e. load a register with the address of a symbol" call unlinkedSend1Args. "i.e. call a run-time routine to find the method, encoding the arg count in the call for arg counts 0 to small n"
When unlinkedSend is invoked it locates the receiver from e.g. argument registers, obtains the selector from classregister and uses a modified version of first-level method lookup cache algorithm above to locate an nmethod for the lookup. If an nmethod isn't found it searches the class hierarchy, translates the method to native code and stores it in the first-level lookup cache. And now the clever bit... The send site is rewritten fromclassRegister := selector. call unlinkedSend1Args.
toclassRegister := class. "i.e. whatever the class of the receiver was when unlinkedSend was called" call nmethod.entryPoint
The nmethod's code at entry point then checks that the class of the current receiver agrees with that stored in classregister, e.g.entry: tempRegister := receiver class. tempRegister != classRegister ifTrue: [self handleSendMiss]. ...
So if the receiver's class is the same as it was when the send site was rewritten the target method is the same and we're done. So we simply have a class dereference, a regiser assignment and a comparison. 90% of send sites are monomorphic. So this speeds things up enormously.
Polymorphic send stes are sped up by using "polymorphc inline caches" or PICs, which look like a jump table, doing a series of class comparisons.
There is also substantial mechanism to allow native stack frames to be used, creatring context objects for method activations only when required.
Smalltalk method lookup is fast