general

F-Spot, Glorp and VisualWorks

December 4, 2009

I've been using Linux as my primary desktop platform for some years now. I generally try to keep up with the releases and stick with the default choices as much as possible. Recently I tried to use F-Spot because it's the default photo manager for GNOME now. It's got some nice features and is generally OK, although not very flexible. Very much in the spirit of today's UI design dogmas ("You can't handle flexibility!"). Anyway, I noticed that F-Spot uses sqlite3 as its database, so I wasn't too afraid to spend some effort tagging pictures etc.

Recently, as I was upgrading my computers, I decided to move the pictures to a different location. Unfortunately F-Spot doesn't seem to provide a way to update its database accordingly. Poking around in the database it seemed to be fairly simple database update, so I decided to whip up a quick, Glorp based, database mapping and do the update with a script.

The database has a PHOTOS table with following definition:

CREATE TABLE photos (
	id			INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, 
	time			INTEGER NOT NULL, 
	base_uri		STRING NOT NULL, 
	filename		STRING NOT NULL, 
	description		TEXT NOT NULL, 
	roll_id			INTEGER NOT NULL, 
	default_version_id	INTEGER NOT NULL, 
	rating			INTEGER NULL, 
	md5_sum			TEXT NULL
);

The path to the picture is stored in the base_uri field, usually looking something like 'file:///home/user/Photos/...'. I needed to change all of them to something like 'file:///pub/photos....' instead. So, first I created a Photo class with a simplified version of the above:

	id  database id
	time  time taken
	base_uri  location of the photo
	filename  location of the photo
	description  any notes

Mappings are defined on subclasses of DescriptorSystem, so I created one and started with description of the class model:

classModelForPhoto: aModel

	aModel newAttributeNamed: #id.
	aModel newAttributeNamed: #time type: Timestamp.
	aModel newAttributeNamed: #base_uri type: String.
	aModel newAttributeNamed: #filename type: String.
	aModel newAttributeNamed: #description type: String.

then the table description.

tableForPHOTOS: aTable

	(aTable createFieldNamed: 'id' type: (self fakeSequenceFor: aTable)) bePrimaryKey.
	(aTable createFieldNamed: 'time' type: platform int4) beIndexed.
	aTable createFieldNamed: 'base_uri' type: platform varchar.
	aTable createFieldNamed: 'filename' type: platform varchar.
	aTable createFieldNamed: 'description' type: platform text.

and finally the mapping between the two:

descriptorForPhoto: aDescriptor

	| table |
	table := self tableNamed: 'PHOTOS'.
	aDescriptor table: table.
	(aDescriptor newMapping: DirectMapping) from: #id to: (table fieldNamed: 'id').
	(aDescriptor newMapping: DirectMapping) from: #base_uri to: (table fieldNamed: 'base_uri').
	(aDescriptor newMapping: DirectMapping) from: #filename to: (table fieldNamed: 'filename').
	(aDescriptor newMapping: DirectMapping) from: #time to: (table fieldNamed: 'time').
	(aDescriptor newMapping: DirectMapping) from: #description to: (table fieldNamed: 'description').

 

With these in place I could try to connect to the database. For that I needed to provide the connection information, so I added 2 class side methods:

newLogin

	^(Login new)
		database: SQLite3Platform new;
		connectString: (PortableFilename named: '$(HOME)/.config/f-spot/photos.db') asFilename asString.

newSession

	^self sessionForLogin: self newLogin

With this I could invoke #newSession and get a connected session back. Time to start experimenting with the database.

Reading a photo is easy:

	session readOneOf: Photo.

To figure out what are all the places from which I've imported pictures I used this:

	query := (Query read: Photo) retrieve: [ :e | e base_uri ].
	(session execute: query) asSet.

It reads all the base_uri values and puts them into a Set. A smarter database query can do this more efficiently, but this was fine as well in my database of about 6k pictures. I found out I imported pictures from two locations. I decided to deal with them one by one. To perform the update I ran the following:

	photos := session read: Photo where: [ :p | p base_uri like: '%home/mk/Photos%' ].
	session modify: photos in: [
		photos do: [ :p | p base_uri: (p base_uri copyReplaceAll: 'home/mk/Photos' with: 'pub/photos') ] ].

It reads each photo with the selected location in base_uri and updates it with the new one. Then I did the same for the second location. The entire update operation took less than 20 seconds. Later I found out that there's a plugin for F-Spot for this sort of migration, but its comment said that it can take a few hours. I don't know how big a database they had in mind, but that sounds a bit excessive still.

Since then I fleshed out the mappings, created a Glorp Workbook so that it's more convenient for quick experiments (you get a toolbar button for easy access) and packed it all up. I published the package to the public repository as F-Spot, hoping it might be useful to someone else too. As far as future plans go, there really aren't any beyond finishing the mapping layer. One thing I'm considering is that I find the imports into F-Spot excruciatingly slow. I might use this package for that task instead.

 

posted by Martin Kobetic

general

WebSupport updates

December 4, 2009

Our Seaside effort yields some useful byproducts including improvements to the, so far rather Spartan, WebSupport package. This package now provides HttpClient and HttpRequest extensions simplifying submission of HTML form data through HTTP POST method.

In general, form data can be submitted in a "url encoded" format in a simple, single-part HTTP request (content-type: application/x-www-form-urlencoded), or each data entry can be submitted as an individual part in a multipart HTTP request (content-type: multipart/form-data). Multipart messages are used when form data contains entries with relatively large values, for example when a form has external files attached to it for upload to the server. More information about HTML forms can be found at http://www.w3.org/TR/html401/interact/forms.html#h-17.13.

The default behavior of WebSupport extensions is to submit forms as simple requests. Form entries can be added individually using #addFormKey:value: message, or set at once using #formData: message which takes a collection of Associations. Note that #formData: replaces any previous form content. The following example

	stream := String new writeStream.
 	(HttpRequest post: 'http://localhost/xx/ValueOfFoo')
		addFormKey: 'foo' value: 'bar';
		addFormKey: 'file'  value: 'myFile';
		writeOn: stream.
	stream contents

yields this result:

POST /xx/ValueOfFoo HTTP/1.1
Host: localhost
Content-type: application/x-www-form-urlencoded
Content-length: 19

foo=bar&file=myFile'

An alternative way to post a form is through HttpClient, in this case the request gets automatically executed and the result is the response from the server.

	HttpClient new
		post: 'http://localhost/xx/ValueOfFoo' 
		formData: (
			Array
				with: 'foo' -> 'bar';
				with:'file' -> 'myFile').

To force the form to submit as a multipart message, send #beMultipart to the request at any point. Any previously added entries will be automatically converted to message parts. Note however that conversion of multipart messages back to simple messages is not supported, as it is not always possible without potentially losing information.

	stream := String new writeStream.
 	(HttpRequest post: 'http://localhost/xx/ValueOfFoo')
		beMultipart;
		addFormKey: 'foo' value: 'bar';
		addFormKey: 'file'  value: 'myFile';
		writeOn: stream.
	stream contents

and the result is

POST /xx/ValueOfFoo HTTP/1.1
Host: localhost
Content-type: multipart/form-data;boundary="=_vw0.98992842109405d_="
Content-length: 183

--=_vw0.98992842109405d_=
Content-disposition: form-data;name=foo

bar
--=_vw0.98992842109405d_=
Content-disposition: form-data;name=file

myFile
--=_vw0.98992842109405d_=--

File entries can be added using message #addFormKey:filename:source:. Adding a file entry automatically forces the message to become multipart to be able to capture both the entry key and the filename.

	stream := String new writeStream.
	(HttpRequest post: 'http://localhost/xx/ValueOfFoo')
		addFormKey: 'foo' value: 'bar';
		addFormKey: 'text'  filename: 'text.txt' source: 'some text' readStream;
		writeOn: stream.
	stream contents

POST /xx/ValueOfFoo HTTP/1.1
Host: localhost
Content-type: multipart/form-data;boundary="=_vw0.015112462460581d_="
Content-length: 247

--=_vw0.015112462460581d_=
Content-disposition: form-data;name=foo

bar
--=_vw0.015112462460581d_=
Content-type: text/plain;charset=utf_8
Content-disposition: form-data;name=text;filename=text.txt

some text
--=_vw0.015112462460581d_=--

Adding a file entry attempts to guess the appropriate Content-Type for that part from the filename extension. If it doesn't succeed the content type is set to default, i.e application/octet-stream. File names with non ASCII character will be automatically encoded using UTF8 encoding. UTF8 will also be used for the file contents if the source is a character stream (as opposed to byte stream).

Adding an entry to a multipart message returns the newly created part. That allows to modify any of the default settings or to add new ones. Here's an example changing the filename and file contents encoding to ISO8859-2:

	stream := String new writeStream.
	request := HttpRequest post: 'http://localhost/xx/ValueOfFoo'.
	part := request addFormKey: 'czech'
				filename: 'kůň.txt'
				source: 'Příli¨ ¸luťoučký kůň úpěl ďábelské ódy.' withCRs readStream.
	part headerCharset: #'iso-8859-2';
		charset: #'iso-8859-2'.
	request writeOn: stream.
	stream contents

POST /xx/ValueOfFoo HTTP/1.1
Host: localhost
Content-type: multipart/form-data;boundary="=_vw0.74617905623567d_="
Content-length: 228

--=_vw0.74617905623567d_=
Content-type: text/plain;charset=iso-8859-2
Content-disposition: form-data;name=czech;filename="=?iso-8859-2?B?a/nyLnR4dA==?="

Pøíli¹ ¾lu»ouèký kùò úpìl ïábelské ódy.
--=_vw0.74617905623567d_=--

There's also an API to parse messages containing forms in any of the supported forms. Just send #formData to the HTTP message. The result is a collection of associations, the same form as the input to the #formData: message.

 	(HttpRequest post: 'http://localhost/xx/ValueOfFoo')
		addFormKey: 'foo' value: 'bar';
		addFormKey: 'file'  value: 'myFile';
		formData

OrderedCollection ('foo'->'bar' 'file'->'myFile')

File entry values will be entire message parts so that all the associated information can be accessed.

	request := (HttpRequest post: 'http://localhost/xx/ValueOfFoo')
		addFormKey: 'foo' value: 'bar';
		addFormKey: 'text'  filename: 'text.txt' source: 'some text' readStream;
		yourself.
	part := request formData last value.
	part contents

some text

If you'd like to give the new code a try just load it up from the public repository.

posted by Martin Kobetic

general

ResourcefulTestCase

December 4, 2009

The core SUnit package provides support for shared test resources via the TestResource class. A TestCase that wants to use TestResources is expected to list all its resource classes in its class side #resources method. Individual test case methods then access the resources via the resource classes, usually as default, singleton instances. That provides potentially interesting levels of flexibility, however the access to the resources themselves is not exactly convenient. In my experience vast majority of cases involving TestResources either keep repeating the 'self resources first default blah' incantation over and over again, where blah is the name of the real resource the case cares about which is being managed in a blah instance variable of the corresponding TestResource subclass. A more palatable way is adding the same instance variables to the TestCase subclass as well and copying the resource pieces there in the TestCase>>setUp method. Then you can access the resources directly as instance variables in your test case methods. This way the test methods are clean again, but when you want to employ test resources you still have to go through the following sequence of steps:

  1. create a TestResource subclass
  2. add it to the TestCase class>>resources method
  3. add the same set of instance variables to the TestCase subclass
  4. copy the contents of the TestResource default instance to the TestCase instance variables in TestCase>>setUp method

Naturally as you realize you're doing this over and over again, you start thinking, wouldn't it be nice if you didn't have to. My first thought was, what if the TestSuite that gets built out of a given TestCase subclass didn't simply create empty TestCase instances, but instead first created a prototype instance, invoke something like #suiteSetUp on it, which would allow to initialize the shared resources and put them directly into the instance variables of the case prototype. Then all the test cases would be built by simply copying the prototype instance and therefore would have the resource inst vars automagically initialized the same way as the prototype. Tear down could be performed by invoking #suiteTearDown on either the prototype or any of the cases. It doesn't really matter which one, you just have to make sure it is executed exactly once.

Of course while I was enjoying myself following this thread of thought, I forgot one important detail. TestCase instances are not created just before the suite runs. They are often created much much earlier. That of course directly conflicts with the desire to initialize the resources just before they are needed and shutting them down right after the run is complete. I was just about to descend into yet another desperate attempt to rewrite most of SUnit when Alan Knight, who happened to be traveling with me at that moment, responded to my loud complaints with simple: "Just put them into class variables."

And so the ResourcefulTestCase was born. It is an abstract TestCase subclass with simplified support for shared test resources. Instead of separate classes of test resources, it makes sure that if there are class side #setUp and #tearDown methods defined on the class, they run before and after the test suite that gets built out of this test class. This allows one to initialize and store shared resources in class side variables in the #setUp method. It's probably easiest to use shared class variables for easy access from the instance side test case methods. I also like that the capitalized first letter nicely highlights in the test code the difference between the shared resources and the private stuff in case instance variables. Obviously the resources need to be torn down in the class side #tearDown method. It is usually also desirable to nil out the variables as well so that the class doesn't hand on to garbage. The nilling out could be done automatically with a bit of meta-programming, but since it's often also important to finalize/close/release the resources properly as well, I figured it's better to force the user to deal with that, rather than facilitate potentially serious leaks with more automated magic. It's probably also a good idea to call super implementations of the setUp/tearDown methods so that case hierarchies work well.

Here's a quick summary of how to create a test case with resources:

  1. Make the test class a subclass of ResourcefulTestCase:
    XProgramming.SUnit defineClass: #MyTest
    	superclass: #{XProgramming.SUnit.ResourcefulTestCase}
    	indexedType: #none
    	private: false
    	instanceVariableNames: ''
    	classInstanceVariableNames: ''
    	imports: ''
    	category: 'My Tests'
    
    
  2. Add class variables for the shared resources.
  3. Add class side setUp method and initialize the test resources there
    MyTest class>>setUp
    
    	A := Object new.
    	Client := HttpClient new connect: 'testserver'
    
    
  4. Add class side tearDown method releasing the resources
    MyTest class>>setUp
    
    	A := nil.
    	Client close.
    	Client := nil.
    
    

And that's it. All nicely and intuitively packaged in a single class. Now you can simply use the resources in your test methods:

MyTest>>testResources

	self assert: A class == Object.
	self assert: (Client get: 'index.html') isSuccess

The supporting code is published in the public repository in a package called SUnit-SimpleResources

posted by Martin Kobetic

general

No End of Line End Confusion.

December 4, 2009

It's interesting that even though it's now decades old we're still running into this fundamentally rather trivial problem. It is especially pronounced in heavily cross-platform environments like VisualWorks. As soon as you have multiple environments, e.g. Windows and Linux and you move text files between the two, you're pretty much guaranteed to run into files with doubled or tripled line-ends in them sooner or later. Often you can see both line-end conventions mixed up in the same file. The purpose of this article is to provide a reasonably complete picture of what's going on in VisualWorks in this regard and how are we supposed to deal with that. Even seasoned smalltalkers sometimes miss some part of the whole picture, making it more difficult to deal with the consequences.

 

What's going on?

 

Here's a quick recapitulation. There are 3 line end conventions in common use: CR (MacOS), LF (Unix) and CRLF (Windows). CR and LF here refer to characters with ASCII code 13 and 10 respectively. This historical fact can have a profound effect on an environment with the ambition to have fully binary portable images across platforms. Let's say we try to emulate the platform specific line end convention, i.e. use characters CR and LF in String instances on Windows, just LF on Unix, etc. This would make binary portability pretty difficult. Whenever we move an image from Windows to Unix and start it up we would need to convert all the existing String instances from the CRLF convention to LF convention. What's worse it will effectively change the character size of many Strings. While that might be somewhat manageable (we already do something similar for platforms with different endianness) there are further implications. If strings get shortened, the positions of streams set up on top of them will suddenly be off as well. In fact any kind of "pointer" into the String will be potentially broken. That will be much harder to fix. And if it's not fixed, things will start breaking. Binary portability wouldn't really work.

So the only alternative is to pick one convention and stick with it everywhere. Historically the choice has been CR. I don't know why, maybe there was certain affinity of the original Smalltalk-80 developers towards Macintosh platform back then. Ironically, we may eventually end up being the only environment keeping the CR convention as MacOS/X is moving Macs away from the CR convention towards the LF convention of it's Unix based core. Nevertheless, the important point is that in Smalltalk line ends in String instances should *always* be marked with character CR only, no matter what platform it's running on. While character LF is a perfectly valid character and it's pretty easy to construct a String with that character in it, a String with LFs is usually a sign of some kind of anomaly and you can be sure that some components of the vast Smalltalk library won't be able to cope with it. This is very important, so remember, only CRs in your Strings ! This simplification has other advantages too, things like reading a line of text from a stream becomes simply "upTo: Character cr" anywhere.

 

Tools of the trade

 

OK, now that's settled, what about strings that come from outside ? It's not a Smalltalk only world out there (not yet anyway) and the text files on your harddrive will usually use whatever is the native convention for a given platform. The key point here is that these strings are coming from "outside". They should be converted to the Smalltalk convention (CR) as they are brought in. That's why Strings with LF characters in them are often a sign of a failure to convert properly.

The most common method to bring a String in from outside is to use an ExternalStream. And indeed if you take a closer look at those, you'll see that they have a configurable lineEndConvention. Setting an ExternalStream to #lineEndCRLF means that if you are using it to read e.g. a file, it will automatically convert byte sequence #[13 10] into a single character CR. When writing into such stream, it will automatically produce byte sequence #[13 10] whenever it is given character CR. Note that it will write #[10] if you give it character LF instead and similarly it will give you an LF when reading if it encounters a standalone byte 10 (one that is not preceded by 13). This is all assuming the stream is in text mode which is the default mode. If the stream is set to binary mode, it will pass the byte Integers through as they are, without any conversions. But you'll be getting ByteArrays out of the stream, not Strings in that case. The last thing that the streams will do for you automatically is set a default line end convention based on the platform that the image is running on. So if you create an external stream on Linux, it will be set to #lineEndLF. On Windows it will be set to #lineEndCRLF, etc.

There's another kind of streams that provide these capabilities, the EncodedStreams. Primary function of these is to convert bytes to characters according to a specified 'character encoding'. Character encoding is an even bigger can of worms, but for the purpose of this discussion let's just focus on the fact that EncodedStreams provide line end conversion the same way as external streams. In fact you can set an external stream to binary mode, wrap an EncodedStream around it and you should get the same results as with the external stream in text mode. This kind of setup will be handy in cases where you need to deal with character encodings that aren't supported by the external streams. There are in fact only very few of those that are supported by external streams: ISO8859_1, MSCP1252 for older Windows versions, and few more obscure encodings. For anything else (e.g. UTF-8) you will need to use an EncodedStream. So it is actually quite important that the EncodedStream is polymorphic with ExternalStreams.

 

When tools need a hand

 

So far so good. We now know the tools that are available in the base image for dealing with line-end conversions. The automatic configuration for native platform convention usually works out just fine when you're dealing with a single platform. But what if you're accessing a file system from Windows that is shared with Linux ? These are the cases when you may need to intervene and set the line end convention appropriately yourself.

There's also a "pseudo" convention #lineEndAuto which makes the stream do a quick scan of an initial portion of the stream to determine which lineEndConvention to use based on the actual contents. This however may not be feasible on some types of external streams, e.g. socket streams, because it requires reliable positioning capability.

Yet another alternative is the following. Instead of insisting on a specific convention for the entire stream, we could simply convert any convention to CRs as the stream is being read. This is especially nice when you want to cleanup a text with mixed conventions. Moreover this is actually doable without any pre-scanning and therefore could work on any kind of stream, even stream that cannot be peeked and positioned. This functionality isn't available in the Base image though, so I'm going to plug here my AutoLineEndStream which does exactly that. It can be found in the ComputingStreams package in the public repository. With this the following will evaluate to true:



stream := #[10 13 10 10 13 13] asString readStream.
stream := AutoLineEndStream wrap: stream.
stream contents = '\\\\\' withCRs


Note that AutoLineEndStream wraps a character stream (a stream that provides and consumes characters), e.g. an internal character stream or an external or encoded stream in text mode. The reason for this is complexity of character encodings. Even characters CR and LF need to be encoded when they are converted to/from bytes, and the encoding isn't always just 13 or 10. A CR can be #[13 00] or #[00 13] in UTF-16 encoding depending on the endianness and it can get even worse with other encodings. In general case it is impossible to pick out CR and LF out of an encoded text without decoding the other characters as well. So making AutoLineEndStream work with bytes would force it deep into the character encoding issues. It is much simpler to leave this domain to EncodedStreams and operate on characters instead, suddenly its task becomes trivial. So if you need to read an external file simply keep the stream (external or encoded) in text mode, set it to #lineEndTransparent which means don't do any conversion (all CRs and LFs will be preserved), and wrap it in AutoLineEndStream. Here's the corresponding code:



stream := 'messed up file.txt' asFilename readStream
stream lineEndTransparent.
stream := AutoLineEndStream wrap: stream.


Note that #lineEndTransparent is actually equivalent to #lineEndCR, just named differently to emphasize the effective transparency of this mode, and in this case definitely expresses the intent better.

I should also mention that AutoLineEndStream will perform the same conversion when writing. If you write a CR and then an LF into the stream, it will write only CR into the underlying stream. So if you happen to have a String with mixed up line ends in memory, simply writing it through an AutoLineEndStream should straighten that ou as well. The following should evaluate to true as well:



stream := String new writeStream.
stream := AutoLineEndStream wrap: stream.
stream nextPutAll: #[10 13 10 10 13 13] asString.
stream contents = '\\\\\' withCRs


That's all I can think of that is relevant to this topic. I believe the problem isn't particularly difficult to deal with once you have a good understanding of what is going on and what should be happening. I hope this article will help with that. Thanks for reading this far and if you have any corrections, suggestions, ideas, I'll be watching the comments eagerly.

posted by Martin Kobetic

cst

Cincom Smalltalk Summer Release

December 4, 2009

We will have the Summer Release of Cincom Smalltalk shipping as of June 30th. As soon as that's done, we'll post the new bits for NC download. The details:



Release Date: June 30, 2006

The summer releases of Cincom Smalltalk are maintenance/bug fix releases. As such, you won't see any new features coming out. You will see enhancements, bug fixes, etc.

The new VMs for Mac OSX (Power PC only and intel Mac VM) will follow after the summer release, but before the Winter release). We will formally ship these new VM's in the winter release, but they will be available via vw-dev (and to other interested parties) before the winter release.

Highlights for VisualWorks

  • Security - We are including an implementation of PKCS #8
  • COM - The COM Automation Wizard can now save and restore settings to create a VW COM server image.
  • Font Matching - on Font matching failures, the system will return the best match it can find instead of raising an exception
  • WS* - Further Enhancements to various aspects of our WebServices implementation
  • NetClients - We have implemented support for Digest Authentication and for NTLM Authentication

There are various other improvements and bug fixes; the file fixedARs.txt on the CD includes an exhaustive list

Highlights for ObjectStudio

  • We are providing Early Access for ObjectStudio 8 by request only - please contact James Robertson if you are interested
  • OLE bug fixes and enhancements
  • Database bug fixes and enhancements
  • Both the XML Parser and the Opentalk framework have been synchronized with the VisualWorks implementations

posted by James Robertson

techtips

Skipping in non-positionable streams

December 4, 2009

The latest version of the ComputingStreams in the public repository adds two new stream classes, CachedReadStream and CachedWriteStream. I'm not proud of the well worn, "cached", name and welcome any suggestions, but that's what it is for now. These streams are the first step in my (hopefully) final battle with stream positioning. If you've ever used streams in anger, you'll probably agree that many problems require yanking the stream position back and forth frantically. That of course clashes quite badly with the fact that some streams are inherently non-positionable.

Streams and Positioning

The canonical example of non-positionable stream in my world is socket stream. Once you write something into it, you can forget about going back and rewriting that length prefix on that message that you have just finished marshaling, because by that time the prefix may well already be at the receiving side. That's an obvious one, but as years of working with various kinds of streams go by you'll find out that even though most streams in VisualWorks subclass from PositionableStream, many are not or, what's worse, are only partially ... sometimes. For example even the socket streams, as all buffered external streams, do support positioning API and if you try, you'll find out that you are able to skip and peek ahead if you don't try to go too far back. So you're all happy that your little algorithm works with all kinds of streams until the point when your deployed algorithm happens to peek over the buffer boundary and blows up unable to skip back. Or how about this bonus question, is write stream on a file positionable or not ? And how about read-write stream on a file ? OK, you might think, external streams are just messy, but internal streams should be clean, right. Hm, so what is the result of this one ?

	(ByteArray new withEncoding: #utf8) readWriteStream
		lineEndCRLF;
		nextPutAll: 'abc';
		cr;
		nextPutAll: 'def';
		position: 5;
		next
I guess the #lineEndCRLF gives it away, but often with a multi-byte encoding like UTF-8 where individual characters may take anywhere from 1 to 6 bytes, your chances to nail the position you want to hit aren't that great. Now, to be fair it seems that historically the stream library just wasn't designed with "stacking" streams on top of each other in mind, so positioning a character stream in terms of the encoded bytes in the specific case of EncodedStream isn't that big a deal in that context. However that just doesn't scale if you are dealing with streams potentially several levels of element translation deep. Imagine for example that you want to take your text, encode it in UTF-8, compress it, encrypt it and append a signature. You really need to position in terms of the elements at the top, not the bottom. Or more precisely, the positioning argument to a message sent to a stream must be expressed in terms of the elements that are written into or read from that particular stream. Otherwise you're forced to make assumptions about the overall composition of streams and completely break the abstraction layers. Just imagine that you are deploying your text processing application somewhere in China and you need to substitute UTF-8 for some Chinese encoding, because UTF-8 is just too in-efficient for that particular language.

Moreover, I'll argue that absolute positions themselves are a wrong abstraction. Streams in the most general sense are infinite, they don't end, they don't start, they just have their current position. Messages like #position, #atEnd, #reset, #contents don't even make sense there. If you rely on these messages in your stream processing you're placing severe restrictions on the kind of streams that you can deal with. And don't think this is just "too abstract". Socket streams, media streams, etc, are very much like that. The algorithm encoding particular frame in the video stream shouldn't need any of those messages above. Similarly asking a socket stream if it's #atEnd is a nonsense as well. It cannot reliably answer false until the connection was properly closed and we can be sure that we've received everything that was sent by the other party. Until that point in time the correct answer to #atEnd is definitely not boolean.

So I hope I've convinced you now that absolute positioning and related messages like #atEnd should be avoided whenever possible. But since we still need to yank the current position of a stream around we obviously need to use at least some sort of relative positioning. That's what the #skip: message is for, as it allows positioning relative to the current position of the stream. Again, with element translation in mind it is important that even the relative positions are expressed in terms of the elements at the stream level to keep proper separation of abstractions. But #skip: definitely doesn't prevent (theoretically) infinite streams on either end.

This is where the CachedStreams come in. They sit on top of an arbitrary stream and support skipping regardless of the capabilities of the stream. The stream will never be asked to move back, only forward. Skipping is performed in terms of the elements that the stream deals with, so if you feed characters into the stream you skip in terms of characters, regardless of the characters being converted into bytes using UTF-8 by some stream below.

As you've probably guessed it's achieved by caching. The stream maintains a fixed size buffer that it uses to cache elements. Originally I've started implementing it as a single read-write stream, but it turned out that cached reads and writes have some conflicting requirements, so I ended up splitting it into a read and write streams. The reasons will hopefully become clearer after a more detailed description of each.

Skipping in read streams.

Let's take the CachedReadStream first. The cache is implemented as a circular buffer used as a FIFO queue for the elements. It has a pointer to the current "top", i.e. pointing at the latest element read in from the underlying stream, and a "position" pointer to the current position within the buffer. Consequently the position can back off from the top one full length of the buffer, but not more. Initially the buffer is empty and only fills up as elements are read from the stream. Once it fills up the "oldest" elements will start "falling through" the bottom of the buffer as new elements are added at the top. Skipping forward is unlimited but the cache has to follow along to the target position leaving necessary number of previous elements behind.

The API is identical to other read streams with the crucial difference of #skip: (and consequently #peek) behavior being predictable and consistent regardless of the type of the underlying stream. This also motivated addition of message #previous, a counterpart of #next, but it's not clear to me how useful it is to read the elements in reverse order. I guess the time will tell how useful will this capability be in practice.

Another important (deliberate) difference is that this stream explicitly translates the EndOfStreamNotification into a hard IncompleteNextCountError. I believe that the whole deal with the notification and returning a nil as the result of #next at the end of the stream is a bad mistake, so this is an attempt to start moving away from this behavior. We'll see if I have to back down on this one. The name EndOfStream for the error would probably be better, however the IncompleteNextCountError has been used for a very similar purpose for a long time so I have doubts about introducing a new exception class for almost the same thing. We'll see.

This would be a good time to show an example but since the API is the same traditional stuff, the best I can offer is this snippet of test code.

	| stream |
	stream := (ByteArray new withEncoding: #utf8) readWriteStream.
	stream nextPutAll: 'abcdefghijklmnopqrstuvw'.
	stream reset.
	stream := stream readCache: 5.
	stream skip: 10.
	stream next: 4.		" => 'klmn' "
	stream skip: -5.		" that wouldn't work with bare encoded stream! "
	stream next: 10.	" => 'jklmnopqrs' "
	stream nextAvailable: 100.	" =>  'tuvw' "
	stream next.		"raises and IncompleteNextCountError"
The implementation also pays special attention to the block based APIs based on #next:into:startingAt:, translating that to as few block based operation on the buffer and on the underlying stream as possible. This allows taking advantage of block based copying primitives and can push through larger quantities of data more efficiently. This is clearly an optimization, but the difference is significant enough in most non-trivial applications.

Skipping in write streams.

The CachedWriteStream is similar in many respects but there are significant differences. Written elements are cached in the buffer and only written down into the underlying stream when they fall through the bottom. This is to allow skipping back and rewriting the contents of the buffer arbitrarily until it gets flushed, which is useful for generating things like length prefixed message formats and such. This is the primary difference between the write and read streams. With read streams the position of the underlying stream is attached to the top of the buffer, but with write streams it is attached to the bottom. Another difference is that skipping forward in a write stream is also limited by the top of the buffer, because that represents the absolute end of the write stream, there are no more elements beyond it.

The write stream also provides reading capability within the confines of the buffer, because it is safe and easy to do and can be useful with some types of algorithms. Note however, that the reading and writing operations share the same position pointer. So a #next moves the position the same way as #nextPut:. So, for example

	(String new writeStream writeCache: 5)
		nextPutAll: 'abcdef';
		skip: -4;
		next: 2;
		nextPutAll: 'EF';
		contents 
yields 'abcdEF'. Otherwise the API is identical to other write streams. However it is important to note that when you're done with the write stream you have to #flush or #close the stream in order to get the buffered elements written into the underlying stream. That applies equally to both internal and external streams. Sending #close to all streams when you're finished with them is a good habit to grow anyway, that's the only way to have your algorithms work properly with both internal and external streams. Also some kinds of encodings (like base-64 for example) also need to be informed to flush at the end to terminate the encoding properly even when working in memory.

The write stream also pays special attention to the block based APIs based on #next:putAll:startingAt:, translating that to as few block calls as possible.

Concluding Remarks

From the 300ft view the cached streams look very much like the buffered streams in the library, the only difference being that the buffer is circular with the cached stream. That however is the crucial difference catering to different purpose. The buffered streams are meant to accumulate small reads and writes to minimize the expense of the system read-write calls. A circular buffer wouldn't work at all for this purpose. Conversely the discontinuous operation of these buffers is the cause of the socket stream blowing up on simple peek when it happens to cross the buffer boundary. You really need a continuous (circular) buffer to support reliable skipping. Different purpose needs different buffering strategy.

The cached streams should work well for any algorithm where you can predict maximum size of the step back that you're going to need. However that is not always a practical assumption. My favorite example for this is BER/DER encoding in ASN.1 where you have deeply nested trees of length prefixed encodings of pretty much arbitrary size. There's an idea I'd like to pursue in this regard inspired by the "marking" capability I noticed in the java.nio.Buffer class when I was looking at the stream hierarchy in J2SE. I envision being able to #mark the current position in the stream to inform it that it needs to start caching, and be able to #reset back to that position. However to be able to handle nested structures like BER encoding conveniently I think I'll need to maintain a stack of markers. Also since the buffer will need to to grow arbitrarily, I plan to use a paged buffer so that I can grow and shrink it efficiently. Anyway I'll see how that's going to pan out. It might make a nice topic for another post.

posted by Martin Kobetic

support

Cincom Smalltalk Support Resolutions

December 4, 2009

This comes from the Cincom Smalltalk Support team - it's a listing of resolutions that have gone out recently:

The resolutions listed below were developed by Cincom Smalltalk Support and Engineering to solve problems reported by our customers. These resolutions may or may not help solve any specific problem that you might encounter. We strongly advise you to back up your application before applying the suggested fix or work-around in case you may need to restore your application to its previous state. Resolutions can be viewed at http://SupportWeb.cincom.com .

Contents

  • Technical note
  • VisualWorks resolutions and patches
  • ObjectStudio resolution and patch builds

Technical Note -- Cloning a VW Store Repository (Resolution 89189)

Cincom has not officially published a way to clone your repository, and this operation isn't formally supported. But we do have some suggestions about how it can be done.

First, note that there is a currently unsupported package (widely used, though) called the Store Replicator. Many users load this and create replicas of their main repositories. It allows for cross-platform/cross-db replicas.

The other way is to make a backup copy of your repository using the DBA's backup commands for that db, and restore the database into a new server. Make sure that you include the VIEWS as well as the TABLES and INDEXES, and TABLESPACES, etc. To help verify that the process is complete, compare your results to all the database objects listed in the install script (generated by executing: DbRegistry createInstallScript). These should all be present on the new server.

After that, ensure that the sequence generator is working correctly. It should not be reset, since that would result in duplicate key values generated. (There are ways to examine this in SQL.)

Update the name of the repository, the DatabaseIdentifier table, using the new repository name (if you don't want it to clash with the original name, copied over to the cloned db table). This can be done in Smalltalk, or using SQL.

Finally, on Oracle, if you are cloning your db into a different database on the same server, make sure you update the VIEWs because, as copied, they will point to the original database (say, BERN.xxx). The views for the cloned repository should point into its own database (say, ABC.xxx). So, you'll have to update the VIEW by hand. This can be done in Smalltalk, or using SQL.

VisualWorks -- Newest Resolutions

Release Resolution ID Description
7.4 89294 How to delete exactly one package in Store?
7.4 89189 Duplicate of Oracle Store repository does not work. What's wrong?
7.4 89160 How to find which protocol a method belongs when the settings are set to show all methods when no protocol is selected.
7.4 88884 OracleConnection>>quiesce can mix up database because the rollback of the super class method is not perfomed.
7.4 88873 The VM crashes with "Out of memory" when a stack fault occurs (infinite recursion).
7.3.1 88702 WebToolKit does not consider TimeOut set in server console and resolver.
7.3.1 88681 We use WebToolKit with WisualWav and Tiny-Http-Webserver. Now we want to switch to an Apache or BEA-Logic web server.
7.3.1 88661 How to manipulate CrystalReports via COM-Connect.
7.3.1 88544 How to upgrade store and use parallel from both releases?
7.3.1 88541 VisualWorks under Windows 2003 Server, does it work?
7.3.1 88505 How can I get Visual Basic to work with the WizardExample.Apllication COM server example?
7.3.1 88346 Deleting in an InputField copies an empty line into PreviousSelections plus other problems with the CopyBuffer resp. PreviousSelections list.
7.3.1 88063 Ora error 00942: 'table or view not ...' when loading 7.1 application into new 7.3.1 image.
7.3.1 87659 Parcel not marked dirty on method changes.
7.3 88540 Do Toggle buttons exist in VW?
7.3 88474 Lens does not resolve proxy. Oracle error thrown.

VisualWorks -- Newest Patch

Vw 7.3.1 AR 48943: Wrong X2O binding for a complex type with a few choices http://www.cincomsmalltalk.com/CincomSmalltalkWiki/VW+7.3.1+Patches.

VW 7.4 AR 50145: VW Internet Explorer Plugin broken in 7.4 http://www.cincomsmalltalk.com/CincomSmalltalkWiki/VW+7.4+Patches

ObjectStudio(R) -- Newest Resolutions

Release Resolution ID Description
7.4 89294 How to delete exactly one package in Store?
7.4 89189 Duplicate of Oracle Store repository does not work. What's wrong?
7.4 89160 How to find which protocol a method belongs when the settings are set to show all methods when no protocol is selected.
7.4 88884 OracleConnection>>quiesce can mix up database because the rollback of the super class method is not perfomed.
7.4 88873 The VM crashes with "Out of memory" when a stack fault occurs (infinite recursion).
7.3.1 88702 WebToolKit does not consider TimeOut set in server console and resolver.
7.3.1 88681 We use WebToolKit with WisualWav and Tiny-Http-Webserver. Now we want to switch to an Apache or BEA-Logic web server.
7.3.1 88661 How to manipulate CrystalReports via COM-Connect.
7.3.1 88544 How to upgrade store and use parallel from both releases?
7.3.1 88541 VisualWorks under Windows 2003 Server, does it work?
7.3.1 88505 How can I get Visual Basic to work with the WizardExample.Apllication COM server example?
7.3.1 88346 Deleting in an InputField copies an empty line into PreviousSelections plus other problems with the CopyBuffer resp. PreviousSelections list.
7.3.1 88063 Ora error 00942: 'table or view not ...' when loading 7.1 application into new 7.3.1 image.
7.3.1 87659 Parcel not marked dirty on method changes.
7.3 88540 Do Toggle buttons exist in VW?
7.3 88474 Lens does not resolve proxy. Oracle error thrown.

ObjectStudio -- Newest Patch

ObjectStudio Patch builds are available on request. Please contact support at helpna@cincom.com (north America) or eurotsc@cincom.com (Europe).

ObjectStudio 7.1 Patch build os71f032306 contains the following fixes.

82247 Returns the correct table when accessing tables created by someone else when you are the owner of a same named table.

82203 Problem with Databases Controller 'Sending message #fieldNames to nil' when accessing DB2 tables.

82126 Improved word wrapping behavior for workplace object titles.

82120 Fixed debugger when setting RadioButton options to nil.

82101 In certain circumstances, the text editor could delete a file when trying to save it.

82029 Release WindowPen resources properly after drawing bitmap buttons.

82024 Correct a problem with OLE components marshalling strings. The problem would occur when the Smalltalk code defined the interface as VT_BSTR but the interface was acutally (VT_BSTR | VT_BYREF).

81998 Correct errors in the Small Program Generator (spgen.txt).

81991 MS SQLServer, the SQL statement 'Select cnt = @@rowcount' was not working properly.

81907 Keep required support classes in the image when specifying to keep debugger interfaces from the ProgramGenerator.

81863 After fix for FR 34181, the DB2 wrapper did not report the problem correctly when warning message number was +100.

81859 Set initial focus properly (this problem existed only in a 7.0.1 patch release).

81855 Fix a Memory Access Violation with DB2 when loading tables with no $. specifier

81851 Fix problems comparing NAN with other numbers, (NAN = anyNumberOtherThanNAN) is always false now.

81836 Fix a focus problem - a window from another application was being given focus when closing a ModalDialogBox (this problem existed only in a 7.0.1 patch release).

81810 Fix a problem with fonts being incorrectly dislayed as bold when a window was covered and then uncovered.

81796 Sybase MONEY, MONEY4 and DECIMAL types are now retrieved with precision 38 and scale 4 for greater accuracy.

81700 Width and Height were reversed in the inst var portRect when creating Bitmaps.

81699 transparentMask was not being set correctly for instances of TransparentBitmap. See new method TransparentBitmapClass>>newFromBitmap:transparentColor:

81685 Current cursor position (getSelection) was sometimes incorrect for FormString.

81516 The charWidth instance variable of a TabListCtrl is nil until the font has been set, which could sometimes cause a "Sending message #asFloat to nil" error. TabListCtrl>>charWidth now ensures that the font has been set.

81233 lostFocus and performValidationFor: behavior in ModalControllers was made consistent with that in non-modal Controllers.

  • The VM now sends #checkForAnotherPageAcceleratorActivation: to a PropertyPage on an Alt+ combination to implement a check for a mnemonic.
  • To activate a different page in the same PropertySheet, define the PropertyPage's name/title with an '&' before the character to be underlined.

81205 TabListCtrl now increases the rowHeight slightly to provide space for gridlines if the #Lined option is on.

81187 Fixed crash in SocketReadStream>>primUpTo: and possibly incorrect results from InternalStream>>primSkipTo:

81169 Duplicate instance variables were removed and code that modifies literal arrays was rewritten to not do so.

81129 Fixed GlobalDictionary>>commandLineOptionAt: to preserve the case of the command line arguments.

81097 Restored performance of String>>at:put: with very large receivers and indices over 32K (this performance had degraded in 6.9.1) and improved performance of sending #replaceFrom:to:with: to a String, thus improving StreamString.

81083 Copy & paste from Outlook into Workspace or browser in ObjectStudio does not allow re-paste into Outlook.

81061 The Items and Available Methods lists in the Designer's Return Key Method Assignment dialog are now sorted.

81047 Fix debugger in XML file parsing by defining PredefinedEntities.

81028 GlobalDictionary>>locale was treating the registry value HKEY_CURRENT_USER\Control Panel\International\Locale as a String representing a decimal number instead of a hexadecimal number, causing incorrect results.

80880 Created the method CharacterClass>>euro to answer the Euro symbol properly in both classic and Unicode versions of ObjectStudio, and use that method to initialize AvailableCurrency.

80748 Fix load error with CommBuilder.

80730 Avoid TopicBox flickering when running ObjectStudio with a manifest file on a machine using the Windows XP style appearance.

80589 Fix debugger 'Instance method #onReturn is not found in Class FormTreeView'.

80556 Fix TreeViewCtrl>>displaySample to always select the first item in the treeview after setting a new list, to make it visibly apparent when that treeview gets focus.

80495 WorkplaceObjects were not being properly displayed if they were under active child forms.

80157 Implemented FormItem>>handleBeep to avoid debugger from InterfacePart>> invalidDataFor:

79721 Made mouse scrollwheel work properly with TabListCtrl items in Win2000 and greater.

79805 Redesigned bitmap button implementation to properly display disabled state in all cases.

79412 ReadIniFileStreamclass>>file:onError: now answers the result of the error block on error.

79087 Module>>privateLoad:onError: was changed to answer the result of the error block on error.

78813 Wait cursor issues partially fixed, the wait cursor still doesn't display properly when selecting a drop down list box item by mouse.

78145 ControllerItem>>hitKeyKeyName: refactored to ensure that (System showWaitCursor) is always sent, and a VM flag to suspend displaying of the wait cursor is now maintained properly in all cases.

74888 Display of bitmaps from deployment image under Windows 2000 had wrong colors.

74351 String>>replaceExpression:with:options:times: was modified so the argument to with: could be a large as 16K characters instead of only 256 characters.

72767 ObjectStudio will now open a MDI subform maximized if that option is set in the Designer.

34216 Many implementors of privateExecSql: were modified to raise an error if the argument is nil.

32397 Display hourglass while loading an application.

ObjectStudio 7.1 build os701f030906 contains the following fixes. There are known issues with default button resulting in a double event and a GDI memory resource leak.

72767 ObjectStudio will now open a MDI subform maximized if that option is set in the Designer [updated].

74888 Can use a TransparentBitmap for button.

79087 Change Module class>>privateLoad:onError: to answer the result of evaluating the error block instead of the integer error code on error.

79805 Redesigned bitmap button implementation to properly display disabled state in all cases.

80053 OLE_DEMO Error: Avoid error in CustomerServiceController by not attempting to close the MS Word file if we haven't opened it yet.

80555 Further updates to avoid calling the default button action twice.

80706 Do not process mnemonic accelerator events for SubForms that don't have focus.

80880 Created the method CharacterClass>>euro to answer the Euro symbol properly in both classic and Unicode versions of ObjectStudio, and use that method to initialize AvailableCurrency.

81047 Initialize PredefinedEntities for XML Parser.

81233 Multiple related changes, including lostFocus (Form vs. ModalDialogBox), #performValidationFor:, keyboard handling.

81699 transparentMask was not being set correctly for instances of TransparentBitmap. See new method TransparentBitmapClass>>newFromBitmap:transparentColor:

81851 Float>>= was behaving in unpredictable ways when the reciever or argument was NAN. It either is NAN it will now answer true only if both are NAN.

81998 spgen has errors during load.

82029 Correct Bitmap Button regressions, including resource leak.

82210 FormOLEControl items were not getting focus when tabbed into, and interrupting tab traversal sequence.

82240 MDI: maximize option, extra click needed to open login window.

82241 Regression fixed for regular expression.

82469 Horizontal grid lines disappeared when scrolling back up in TabListCtrl.

82505 Alt+ combinations were not opening menus as they should from MDI child windows.

82534 Fixed a crash when an OCX passed VT_EMPTY an argument instead of the defined VT_VARIANT.

We value your feedback. Please feel free to send me your questions or concerns.

Kim Thomas,
Smalltalk Support Manager
kthomas@cincom.com
Cincom Systems, Inc.
55 Merchant Street
Cincinnati, Ohio 45246

posted by James Robertson

techtip

VW 7.4 Spoilers - ASN.1

December 4, 2009

I've mentioned previously that we have spent a lot of time on ASN.1 in this release cycle, so I better say something about it. However this article won't be an introduction to ASN.1, I want to focus on the improvements in our implementation, but there are some easy introductions available and even free books for the gory details.

I figured that the best way to demonstrate the framework is to show how it's used in an application, and our most interesting application so far is the X.509 framework, so let's take a look at that. The X.509 framework is structurally fairly simple, you have a hierarchy of X509Objects representing various components of an X.509 Certificate and few supporting classes like X509Registry or specific exception classes. The job of the ASN.1 framework is to turn X509Object instances into DER encoded bytes and back. To be able to do that the framework needs a structural description of the encoded bytes. It needs to know that an encoded certificate starts with an encoded TBSCertificate, then an identifier of the algorithm used to sign the contents of the TBSCertificate (TBS stands for 'to be signed' here) and finally the bytes of the signature itself. ASN.1 describes this structure using a C-ish notation. The Certificate definition looks as follows:


	Certificate  ::=  SEQUENCE  {
		tbsCertificate       TBSCertificate,
		signatureAlgorithm   AlgorithmIdentifier,
		signatureValue       BIT STRING  }


A SEQUENCE is like a struct in C and the elements show name first and type second. Our framework represents this information with a structure of ASN1.Type objects, closely following the ASN.1 expressions. In this case it would be something like


	(SEQUENCE name: #Certificate)
		addElement: #tbsCertificate type: TBSCertificate;
		addElement: #signatureAlgorithm type: AlgorithmIdentifier;
		addElement: #signatureValue type: BIT_STRING;
		yourself


This expression will not work correctly though, because the #type: arguments would have to be other ASN1.Type objects. If you build the type objects in the right order, making sure all the component types are created before the containing types, you might be able build the structure by hand, however that would be very inconvenient to maintain. It is OK for relatively simple structures, like


"	RSAPublicKey ::= SEQUENCE {
		modulus           INTEGER,  -- n
		publicExponent    INTEGER   -- e }
"
	(SEQUENCE name: #RSAPublicKey)
		addElement: #modulus type: INTEGER;
		addElement: #publicExponent type: INTEGER;
		yourself


For the more complex cases the framework provides a more convenient mechanism, a Module. An ASN.1 Module is a container for a set of related Type definitions and consequently provides a context for lookup of types by name. As soon as you put a SEQUENCE into a Module, you can define its elements using type names instead of full instances and it also takes care of resolving forward references, i.e. you can define types in any order you wish, their mutual references will be resolved properly as the type definitions get added. The X509 framework maintains its module in a shared class variable X509Object.ASN1Module. Using a module the type definition for Certificate can look as follows:


	module := Module new: #X509.
	tCertificate :=
		(module SEQUENCE: #Certificate)
			addElement: #tbsCertificate type: #TBSCertificate;
			addElement: #signatureAlgorithm type: #AlgorithmIdentifier;
			addElement: #signatureValue type: #BIT_STRING;
			yourself


Once we have the type definitions in place the marshaling framework knows enough about the encoded bytes, however in order to be able to map objects to bytes, it needs to know how the types correspond to classes. For most of the "simple" types there's predefined correspondence, i.e. BOOLEAN maps to Booleans, INTEGER to Integers, etc. However SEQUENCE and SET types default to instances of ASN1.Struct which is kind of like a Dictionary but with few convenience gimmicks, like you can use the element names as accessors and such. But we don't want a Struct instance for Certificates, we want instances of Certificate class. That's why the SEQUENCE and SET types have an optional 'mapping' attribute. You can tell it to map to a given Smalltalk class. It is responsibility of the developer to make sure that the class provides all the expected accessor methods. The Certificate class does that of course, so all that needs to be done is to tell the type about it:


	tCertificate mapping: Certificate


The last type feature to discuss is encoding retention. X.509 has a fairly tortured history and the practical outcome of it is the rule to never ever re-encode a certificate. Therefore it is desirable for a certificate imported from outside to retain its DER encoding, in case it needs to be exported again. The retained encoding also serves as a cache, so writing out an object with retained encoding can simply dump the retained bits, instead of going through the encoding process.

Any Type can be told to retain its encoding. An encoding is captured in an instance of ASN1.Encoding pointing to the relevant bytes. The framework will pass the Encoding to the corresponding object using #_encoding:type: message. The default implementation in Object will wrap the object in a TypeWrapper which has a slot to capture the encoding, however it is expected that most applications will simply allocate a slot directly in the objects that will retain encoding and therefore will most likely override the method to store the Encoding there (see Certificate>>#_encoding:type: for example).

Note also that the encoding retention behavior was factored out of the marshaling machinery into a standalone EncodingPolicy object and is therefore completely customizable. An interesting side-effect of this is that with customized EncodingPolicy you get a chance to intervene with the marshaling at interesting points in the process. One possible exploitation of this, that was very handy for us while trying to figure out bugs in the marshaling process was the PrettyPrinter policy which produces a map of the bytes on a text stream while marshaling.

Here's an example. Let's take something simpler than Certificate, for example the Name used for issuer or subject fields of Certificate. The full ASN.1 definition of Name is somewhat complex, but this is the part relevant to our example:


"	Name ::= CHOICE { RDNSequence }
	RDNSequence ::= SEQUENCE OF RelativeDistinguishedName
	RelativeDistinguishedName ::= SET OF AttributeTypeAndValue
	AttributeTypeAndValue ::= SEQUENCE {
		type     AttributeType,
		value    AttributeValue }
	AttributeType ::= OBJECT IDENTIFIER
	AttributeValue ::= ANY DEFINED BY AttributeType
"
	(module CHOICE: #Name)
		addElement: nil type: #RDNSequence;
		retainEncoding: true.
	module SEQUENCE: #RDNSequence OF: #RelativeDistinguishedName.
	module SET: #RelativeDistinguishedName OF: #AttributeTypeAndValue.
	(module SEQUENCE: #AttributeTypeAndValue)
		addElement: #type type: #AttributeType;
		addElement: #value type: #AttributeValue.
	module OBJECT_IDENTIFIER: #AttributeType.
	module ANY: #AttributeValue.


Basically a Name is somewhat nested collection of attributes, where attribute has a type and a value. Now let's unmarshal an encoded Name using the PrettyPrinter policy.


 	bytes :=   16r3068310B3009060355040613025553311330110603550407130A43696E63696E6E61746931173015060
355040A130E43696E636F6D2053797374656D7331193017060355040B131043696E636F6D20536D616C6
C74616C6B3110300E0603550403130754657374204341 asBigEndianByteArray.
	marshaler := DERStream with: bytes.
	output := String new writeStream.
	marshaler encodingPolicy: (PrettyPrinter on: output).
	marshaler reset.
	name := marshaler unmarshalObjectType: module Name.
	output contents


If all goes well the result of the above code should look as follows:


0	Name
0		RDNSequence
2			RelativeDistinguishedName
4				AttributeTypeAndValue
6					AttributeType
11					ObjectIdentifier(2.5.4.6)
11					AttributeValue
15					'US'
15				AttributeTypeAndValue {type ObjectIdentifier(2.5.4.6), value 'US'}
15			OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.6), value 'US'})
15			RelativeDistinguishedName
17				AttributeTypeAndValue
19					AttributeType
24					ObjectIdentifier(2.5.4.7)
24					AttributeValue
36					'Cincinnati'
36				AttributeTypeAndValue {type ObjectIdentifier(2.5.4.7), value 'Cincinnati'}
36			OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.7), value 'Cincinnati'})
36			RelativeDistinguishedName
38				AttributeTypeAndValue
40					AttributeType
45					ObjectIdentifier(2.5.4.10)
45					AttributeValue
61					'Cincom Systems'
61				AttributeTypeAndValue {type ObjectIdentifier(2.5.4.10), value 'Cincom Systems'}
61			OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.10), value 'Cincom Systems'})
61			RelativeDistinguishedName
63				AttributeTypeAndValue
65					AttributeType
70					ObjectIdentifier(2.5.4.11)
70					AttributeValue
88					'Cincom Smalltalk'
88				AttributeTypeAndValue {type ObjectIdentifier(2.5.4.11), value 'Cincom Smalltalk'}
88			OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.11), value 'Cincom Smalltalk'})
88			RelativeDistinguishedName
90				AttributeTypeAndValue
92					AttributeType
97					ObjectIdentifier(2.5.4.3)
97					AttributeValue
106					'Test CA'
106				AttributeTypeAndValue {type ObjectIdentifier(2.5.4.3), value 'Test CA'}
106			OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.3), value 'Test CA'})
106		OrderedCollection (OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.6), value 'US'})
OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.7), value 'Cincinnati'}) OrderedCollection 
(AttributeTypeAndValue {type ObjectIdentifier(2.5.4.10), value 'Cincom Systems'}) OrderedCollection
(AttributeTypeAndValue {type ObjectIdentifier(2.5.4.11), value 'Cincom Smalltalk'}) OrderedCollection 
(AttributeTypeAndValue {type ObjectIdentifier(2.5.4.3), value 'Test CA'}))
106	Name<RDNSequence:OrderedCollection/>

The numbers at the beginning of each line show offsets into the source bytes. Each unmarshaled entity has two entries in the output, its type at the offset where it starts and the value printString at the offset where it ends. Simple types will have 2 entries next to each other and constructed types will have their elements indented.

There's much more to talk about in ASN.1, things like tagging, sub-typing, constraints, etc. But this post is again getting long, so I'll pick this up some other time. If the release is out before I get to it, you can find more about ASN.1 in the release notes and in the shiny new ASN.1 chapter of the SecurityGuide.pdf. Until then, thanks for reading this far and Happy Holidays!

posted by Martin Kobetic