Some 64 bit details
Our lead VM Engineer, Eliot Miranda, has pushed out a few details on the 64 bit work the team is doing:
The 64-bit System: Overview
The 64-bit implementation uses full 64-bit addresses for objects, providing the ability to fill the entire available address space with objects. For an idea of the effective limits today consider that current AMD x86-64 chipsets support a 48-bit virtual address and a 40-bit physical address, while average 64-bit object size is around 64 bytes. So the maximum number of objects here-on is theoretically (2 raisedTo: 40) / 64.0, or around 16 Giga objects in a system with a terabyte of memory.
The 64-bit System: Implementation
There are only three tagged types, 61-bit 2's complement SmallIntegers, 61-bit unsigned Characters and a 61-bit SmallDouble, subset of the 64-bit IEEE double-precision format that provides the central 1/8th of the IEEE range at full precision. The immediate floating-point format provides a very usable range (approximately -1.0d77 to 1.0d77) which overflows to full 64-bit boxed Doubles when results don't fit. It provides a faster and much more space-affordable floating-point, being about 2.5 times faster (about 2.5 times slower than SmallInteger arithmetic) and having no space overhead.
The rationale behind using only three of the possible seven immediate tag patterns is two-fold. First, measurements show that not much space is saved by doing things like packing seven byte symbols into immediates because intrinsically these short symbols don't take up much space anyway, and providing access to such packed immediate types slows down the path for normal object access in things like the #at: primitives. Second, by using three tag types we can make the #isImmediate, #isSmallInteger and #isSmallDouble tests faster, since they need test only a single bit, these tests being performance-critical to inlined arithmetic and object access in the various #at: and #at:put: primitives.
The 64-bit object representation is relatively more compact than the 32-bit one, resulting in only a 33% growth in object header size from 12 bytes to 16. In particular object headers no longer reference their class object directly but instead include a 20-bit "class index" that is used in all in-line cache tests and in object instantiation. Classes are held in a sparse table and accessed by dereferencing the 20-bit index. This saves 44 bits while imposing a restriction on total number of classes unlikely to be a problem to contemporary applications. Class objects are accessed quite rarely, for example when a message send fails to find a lookup in the method caches and has to do a full class hierarchy lookup, or when the programmer explicitly accesses the class via the Object>>#class primitive. 64-bit objects have a 20 bit identity hash field (and in fact the class index is the class's identity hash) giving 1 Meg hash values (up from 16383 in the 32-bit system) and a maximum of 1 Meg classes. The header of 64-bit pointer objects also includes the number of fixed fields (number of named instance variables) so that the accessing primitives #at: and #at:put: no longer have to indirect through the class to find how many named instance variables to skip over. Consequently array access performance is much improved.
We have gratefully adopted an idea by Mark Van Gulik to do with tagging objects. Because object headers comprise two 64-bit words they can be placed at an even or an odd modulo 128-bit boundary. In the 64-bit system PermSpace objects are on an odd boundary and OldSpace objects on the even one. This means that the store check is slightly faster, but much more importantly means that PermSpace can be placed anywhere in the address space. All that needs to happen is for a PermSpace segment to have its object table aligned on an odd modulo 128-bit boundary. This means we can implement shared PermSpace easily, allowing the operating system to dictate where to memory map a PermSpace segment. Thus bit 4 is a tag that distinguishes between PermSpace and OldSpace objects, and hence we call it "tagged perm".
In the 32-bit system we require all of PermSpace to be above all of OldSpace or vice verca. When shared perm was first implemented OldSpace growth was not implemented. It was therefore easy to implement shared PermSpace with the OS mapping it above all of OldSpace. But once we provided OldSpace growth and shrinkage by memory-mapping OldSpace segments it became much more difficult to guarantee that PermSpace is mapped above all other heap segments since the upper portion of the address space is typically where shared libraries, the stack segment and memory-mapping all collide in a manner best determined by the OS and not amenable to precise control from an application program. Mapping memory at low addresses is typically difficult because in the lower portion of the address space the C heap and the application's code and data itself collide. Hence we have yet to reimplement shared perm in the 32-bit system.
By using the "tagged perm" scheme we are able to decouple PermSpace from location and can hence easily share PermSpace, and allow PermSpace to grow. This first release preview release does not support shared PermSpace
As implied above, a preview (early beta) of the 64 bit VM is on the latest release CD.

