show all comments

general

WebSupport updates

November 28, 2007 23:09:24 EST

Our Seaside effort yields some useful byproducts including improvements to the, so far rather Spartan, WebSupport package. This package now provides HttpClient and HttpRequest extensions simplifying submission of HTML form data through HTTP POST method.

In general, form data can be submitted in a "url encoded" format in a simple, single-part HTTP request (content-type: application/x-www-form-urlencoded), or each data entry can be submitted as an individual part in a multipart HTTP request (content-type: multipart/form-data). Multipart messages are used when form data contains entries with relatively large values, for example when a form has external files attached to it for upload to the server. More information about HTML forms can be found at http://www.w3.org/TR/html401/interact/forms.html#h-17.13.

The default behavior of WebSupport extensions is to submit forms as simple requests. Form entries can be added individually using #addFormKey:value: message, or set at once using #formData: message which takes a collection of Associations. Note that #formData: replaces any previous form content. The following example

	stream := String new writeStream.
 	(HttpRequest post: 'http://localhost/xx/ValueOfFoo')
		addFormKey: 'foo' value: 'bar';
		addFormKey: 'file'  value: 'myFile';
		writeOn: stream.
	stream contents

yields this result:

POST /xx/ValueOfFoo HTTP/1.1
Host: localhost
Content-type: application/x-www-form-urlencoded
Content-length: 19

foo=bar&file=myFile'

An alternative way to post a form is through HttpClient, in this case the request gets automatically executed and the result is the response from the server.

	HttpClient new
		post: 'http://localhost/xx/ValueOfFoo' 
		formData: (
			Array
				with: 'foo' -> 'bar';
				with:'file' -> 'myFile').

To force the form to submit as a multipart message, send #beMultipart to the request at any point. Any previously added entries will be automatically converted to message parts. Note however that conversion of multipart messages back to simple messages is not supported, as it is not always possible without potentially losing information.

	stream := String new writeStream.
 	(HttpRequest post: 'http://localhost/xx/ValueOfFoo')
		beMultipart;
		addFormKey: 'foo' value: 'bar';
		addFormKey: 'file'  value: 'myFile';
		writeOn: stream.
	stream contents

and the result is

POST /xx/ValueOfFoo HTTP/1.1
Host: localhost
Content-type: multipart/form-data;boundary="=_vw0.98992842109405d_="
Content-length: 183

--=_vw0.98992842109405d_=
Content-disposition: form-data;name=foo

bar
--=_vw0.98992842109405d_=
Content-disposition: form-data;name=file

myFile
--=_vw0.98992842109405d_=--

File entries can be added using message #addFormKey:filename:source:. Adding a file entry automatically forces the message to become multipart to be able to capture both the entry key and the filename.

	stream := String new writeStream.
	(HttpRequest post: 'http://localhost/xx/ValueOfFoo')
		addFormKey: 'foo' value: 'bar';
		addFormKey: 'text'  filename: 'text.txt' source: 'some text' readStream;
		writeOn: stream.
	stream contents

POST /xx/ValueOfFoo HTTP/1.1
Host: localhost
Content-type: multipart/form-data;boundary="=_vw0.015112462460581d_="
Content-length: 247

--=_vw0.015112462460581d_=
Content-disposition: form-data;name=foo

bar
--=_vw0.015112462460581d_=
Content-type: text/plain;charset=utf_8
Content-disposition: form-data;name=text;filename=text.txt

some text
--=_vw0.015112462460581d_=--

Adding a file entry attempts to guess the appropriate Content-Type for that part from the filename extension. If it doesn't succeed the content type is set to default, i.e application/octet-stream. File names with non ASCII character will be automatically encoded using UTF8 encoding. UTF8 will also be used for the file contents if the source is a character stream (as opposed to byte stream).

Adding an entry to a multipart message returns the newly created part. That allows to modify any of the default settings or to add new ones. Here's an example changing the filename and file contents encoding to ISO8859-2:

	stream := String new writeStream.
	request := HttpRequest post: 'http://localhost/xx/ValueOfFoo'.
	part := request addFormKey: 'czech'
				filename: 'kůň.txt'
				source: 'Příli¨ ¸luťoučký kůň úpěl ďábelské ódy.' withCRs readStream.
	part headerCharset: #'iso-8859-2';
		charset: #'iso-8859-2'.
	request writeOn: stream.
	stream contents

POST /xx/ValueOfFoo HTTP/1.1
Host: localhost
Content-type: multipart/form-data;boundary="=_vw0.74617905623567d_="
Content-length: 228

--=_vw0.74617905623567d_=
Content-type: text/plain;charset=iso-8859-2
Content-disposition: form-data;name=czech;filename="=?iso-8859-2?B?a/nyLnR4dA==?="

Pøíli¹ ¾lu»ouèký kùò úpìl ïábelské ódy.
--=_vw0.74617905623567d_=--

There's also an API to parse messages containing forms in any of the supported forms. Just send #formData to the HTTP message. The result is a collection of associations, the same form as the input to the #formData: message.

 	(HttpRequest post: 'http://localhost/xx/ValueOfFoo')
		addFormKey: 'foo' value: 'bar';
		addFormKey: 'file'  value: 'myFile';
		formData

OrderedCollection ('foo'->'bar' 'file'->'myFile')

File entry values will be entire message parts so that all the associated information can be accessed.

	request := (HttpRequest post: 'http://localhost/xx/ValueOfFoo')
		addFormKey: 'foo' value: 'bar';
		addFormKey: 'text'  filename: 'text.txt' source: 'some text' readStream;
		yourself.
	part := request formData last value.
	part contents

some text

If you'd like to give the new code a try just load it up from the public repository.

posted by Martin Kobetic

general

ResourcefulTestCase

May 08, 2007 17:33:38 EDT

The core SUnit package provides support for shared test resources via the TestResource class. A TestCase that wants to use TestResources is expected to list all its resource classes in its class side #resources method. Individual test case methods then access the resources via the resource classes, usually as default, singleton instances. That provides potentially interesting levels of flexibility, however the access to the resources themselves is not exactly convenient. In my experience vast majority of cases involving TestResources either keep repeating the 'self resources first default blah' incantation over and over again, where blah is the name of the real resource the case cares about which is being managed in a blah instance variable of the corresponding TestResource subclass. A more palatable way is adding the same instance variables to the TestCase subclass as well and copying the resource pieces there in the TestCase>>setUp method. Then you can access the resources directly as instance variables in your test case methods. This way the test methods are clean again, but when you want to employ test resources you still have to go through the following sequence of steps:

  1. create a TestResource subclass
  2. add it to the TestCase class>>resources method
  3. add the same set of instance variables to the TestCase subclass
  4. copy the contents of the TestResource default instance to the TestCase instance variables in TestCase>>setUp method

Naturally as you realize you're doing this over and over again, you start thinking, wouldn't it be nice if you didn't have to. My first thought was, what if the TestSuite that gets built out of a given TestCase subclass didn't simply create empty TestCase instances, but instead first created a prototype instance, invoke something like #suiteSetUp on it, which would allow to initialize the shared resources and put them directly into the instance variables of the case prototype. Then all the test cases would be built by simply copying the prototype instance and therefore would have the resource inst vars automagically initialized the same way as the prototype. Tear down could be performed by invoking #suiteTearDown on either the prototype or any of the cases. It doesn't really matter which one, you just have to make sure it is executed exactly once.

Of course while I was enjoying myself following this thread of thought, I forgot one important detail. TestCase instances are not created just before the suite runs. They are often created much much earlier. That of course directly conflicts with the desire to initialize the resources just before they are needed and shutting them down right after the run is complete. I was just about to descend into yet another desperate attempt to rewrite most of SUnit when Alan Knight, who happened to be traveling with me at that moment, responded to my loud complaints with simple: "Just put them into class variables."

And so the ResourcefulTestCase was born. It is an abstract TestCase subclass with simplified support for shared test resources. Instead of separate classes of test resources, it makes sure that if there are class side #setUp and #tearDown methods defined on the class, they run before and after the test suite that gets built out of this test class. This allows one to initialize and store shared resources in class side variables in the #setUp method. It's probably easiest to use shared class variables for easy access from the instance side test case methods. I also like that the capitalized first letter nicely highlights in the test code the difference between the shared resources and the private stuff in case instance variables. Obviously the resources need to be torn down in the class side #tearDown method. It is usually also desirable to nil out the variables as well so that the class doesn't hand on to garbage. The nilling out could be done automatically with a bit of meta-programming, but since it's often also important to finalize/close/release the resources properly as well, I figured it's better to force the user to deal with that, rather than facilitate potentially serious leaks with more automated magic. It's probably also a good idea to call super implementations of the setUp/tearDown methods so that case hierarchies work well.

Here's a quick summary of how to create a test case with resources:

  1. Make the test class a subclass of ResourcefulTestCase:
    XProgramming.SUnit defineClass: #MyTest
    	superclass: #{XProgramming.SUnit.ResourcefulTestCase}
    	indexedType: #none
    	private: false
    	instanceVariableNames: ''
    	classInstanceVariableNames: ''
    	imports: ''
    	category: 'My Tests'
    
    
  2. Add class variables for the shared resources.
  3. Add class side setUp method and initialize the test resources there
    MyTest class>>setUp
    
    	A := Object new.
    	Client := HttpClient new connect: 'testserver'
    
    
  4. Add class side tearDown method releasing the resources
    MyTest class>>setUp
    
    	A := nil.
    	Client close.
    	Client := nil.
    
    

And that's it. All nicely and intuitively packaged in a single class. Now you can simply use the resources in your test methods:

MyTest>>testResources

	self assert: A class == Object.
	self assert: (Client get: 'index.html') isSuccess

The supporting code is published in the public repository in a package called SUnit-SimpleResources

posted by Martin Kobetic

general

No End of Line End Confusion.

February 15, 2007 18:53:13 EST

It's interesting that even though it's now decades old we're still running into this fundamentally rather trivial problem. It is especially pronounced in heavily cross-platform environments like VisualWorks. As soon as you have multiple environments, e.g. Windows and Linux and you move text files between the two, you're pretty much guaranteed to run into files with doubled or tripled line-ends in them sooner or later. Often you can see both line-end conventions mixed up in the same file. The purpose of this article is to provide a reasonably complete picture of what's going on in VisualWorks in this regard and how are we supposed to deal with that. Even seasoned smalltalkers sometimes miss some part of the whole picture, making it more difficult to deal with the consequences.

 

What's going on?

 

Here's a quick recapitulation. There are 3 line end conventions in common use: CR (MacOS), LF (Unix) and CRLF (Windows). CR and LF here refer to characters with ASCII code 13 and 10 respectively. This historical fact can have a profound effect on an environment with the ambition to have fully binary portable images across platforms. Let's say we try to emulate the platform specific line end convention, i.e. use characters CR and LF in String instances on Windows, just LF on Unix, etc. This would make binary portability pretty difficult. Whenever we move an image from Windows to Unix and start it up we would need to convert all the existing String instances from the CRLF convention to LF convention. What's worse it will effectively change the character size of many Strings. While that might be somewhat manageable (we already do something similar for platforms with different endianness) there are further implications. If strings get shortened, the positions of streams set up on top of them will suddenly be off as well. In fact any kind of "pointer" into the String will be potentially broken. That will be much harder to fix. And if it's not fixed, things will start breaking. Binary portability wouldn't really work.

So the only alternative is to pick one convention and stick with it everywhere. Historically the choice has been CR. I don't know why, maybe there was certain affinity of the original Smalltalk-80 developers towards Macintosh platform back then. Ironically, we may eventually end up being the only environment keeping the CR convention as MacOS/X is moving Macs away from the CR convention towards the LF convention of it's Unix based core. Nevertheless, the important point is that in Smalltalk line ends in String instances should *always* be marked with character CR only, no matter what platform it's running on. While character LF is a perfectly valid character and it's pretty easy to construct a String with that character in it, a String with LFs is usually a sign of some kind of anomaly and you can be sure that some components of the vast Smalltalk library won't be able to cope with it. This is very important, so remember, only CRs in your Strings ! This simplification has other advantages too, things like reading a line of text from a stream becomes simply "upTo: Character cr" anywhere.

 

Tools of the trade

 

OK, now that's settled, what about strings that come from outside ? It's not a Smalltalk only world out there (not yet anyway) and the text files on your harddrive will usually use whatever is the native convention for a given platform. The key point here is that these strings are coming from "outside". They should be converted to the Smalltalk convention (CR) as they are brought in. That's why Strings with LF characters in them are often a sign of a failure to convert properly.

The most common method to bring a String in from outside is to use an ExternalStream. And indeed if you take a closer look at those, you'll see that they have a configurable lineEndConvention. Setting an ExternalStream to #lineEndCRLF means that if you are using it to read e.g. a file, it will automatically convert byte sequence #[13 10] into a single character CR. When writing into such stream, it will automatically produce byte sequence #[13 10] whenever it is given character CR. Note that it will write #[10] if you give it character LF instead and similarly it will give you an LF when reading if it encounters a standalone byte 10 (one that is not preceded by 13). This is all assuming the stream is in text mode which is the default mode. If the stream is set to binary mode, it will pass the byte Integers through as they are, without any conversions. But you'll be getting ByteArrays out of the stream, not Strings in that case. The last thing that the streams will do for you automatically is set a default line end convention based on the platform that the image is running on. So if you create an external stream on Linux, it will be set to #lineEndLF. On Windows it will be set to #lineEndCRLF, etc.

There's another kind of streams that provide these capabilities, the EncodedStreams. Primary function of these is to convert bytes to characters according to a specified 'character encoding'. Character encoding is an even bigger can of worms, but for the purpose of this discussion let's just focus on the fact that EncodedStreams provide line end conversion the same way as external streams. In fact you can set an external stream to binary mode, wrap an EncodedStream around it and you should get the same results as with the external stream in text mode. This kind of setup will be handy in cases where you need to deal with character encodings that aren't supported by the external streams. There are in fact only very few of those that are supported by external streams: ISO8859_1, MSCP1252 for older Windows versions, and few more obscure encodings. For anything else (e.g. UTF-8) you will need to use an EncodedStream. So it is actually quite important that the EncodedStream is polymorphic with ExternalStreams.

 

When tools need a hand

 

So far so good. We now know the tools that are available in the base image for dealing with line-end conversions. The automatic configuration for native platform convention usually works out just fine when you're dealing with a single platform. But what if you're accessing a file system from Windows that is shared with Linux ? These are the cases when you may need to intervene and set the line end convention appropriately yourself.

There's also a "pseudo" convention #lineEndAuto which makes the stream do a quick scan of an initial portion of the stream to determine which lineEndConvention to use based on the actual contents. This however may not be feasible on some types of external streams, e.g. socket streams, because it requires reliable positioning capability.

Yet another alternative is the following. Instead of insisting on a specific convention for the entire stream, we could simply convert any convention to CRs as the stream is being read. This is especially nice when you want to cleanup a text with mixed conventions. Moreover this is actually doable without any pre-scanning and therefore could work on any kind of stream, even stream that cannot be peeked and positioned. This functionality isn't available in the Base image though, so I'm going to plug here my AutoLineEndStream which does exactly that. It can be found in the ComputingStreams package in the public repository. With this the following will evaluate to true:



stream := #[10 13 10 10 13 13] asString readStream.
stream := AutoLineEndStream wrap: stream.
stream contents = '\\\\\' withCRs


Note that AutoLineEndStream wraps a character stream (a stream that provides and consumes characters), e.g. an internal character stream or an external or encoded stream in text mode. The reason for this is complexity of character encodings. Even characters CR and LF need to be encoded when they are converted to/from bytes, and the encoding isn't always just 13 or 10. A CR can be #[13 00] or #[00 13] in UTF-16 encoding depending on the endianness and it can get even worse with other encodings. In general case it is impossible to pick out CR and LF out of an encoded text without decoding the other characters as well. So making AutoLineEndStream work with bytes would force it deep into the character encoding issues. It is much simpler to leave this domain to EncodedStreams and operate on characters instead, suddenly its task becomes trivial. So if you need to read an external file simply keep the stream (external or encoded) in text mode, set it to #lineEndTransparent which means don't do any conversion (all CRs and LFs will be preserved), and wrap it in AutoLineEndStream. Here's the corresponding code:



stream := 'messed up file.txt' asFilename readStream
stream lineEndTransparent.
stream := AutoLineEndStream wrap: stream.


Note that #lineEndTransparent is actually equivalent to #lineEndCR, just named differently to emphasize the effective transparency of this mode, and in this case definitely expresses the intent better.

I should also mention that AutoLineEndStream will perform the same conversion when writing. If you write a CR and then an LF into the stream, it will write only CR into the underlying stream. So if you happen to have a String with mixed up line ends in memory, simply writing it through an AutoLineEndStream should straighten that ou as well. The following should evaluate to true as well:



stream := String new writeStream.
stream := AutoLineEndStream wrap: stream.
stream nextPutAll: #[10 13 10 10 13 13] asString.
stream contents = '\\\\\' withCRs


That's all I can think of that is relevant to this topic. I believe the problem isn't particularly difficult to deal with once you have a good understanding of what is going on and what should be happening. I hope this article will help with that. Thanks for reading this far and if you have any corrections, suggestions, ideas, I'll be watching the comments eagerly.

posted by Martin Kobetic

cst

Cincom Smalltalk Summer Release

June 27, 2006 14:29:27 EDT

We will have the Summer Release of Cincom Smalltalk shipping as of June 30th. As soon as that's done, we'll post the new bits for NC download. The details:



Release Date: June 30, 2006

The summer releases of Cincom Smalltalk are maintenance/bug fix releases. As such, you won't see any new features coming out. You will see enhancements, bug fixes, etc.

The new VMs for Mac OSX (Power PC only and intel Mac VM) will follow after the summer release, but before the Winter release). We will formally ship these new VM's in the winter release, but they will be available via vw-dev (and to other interested parties) before the winter release.

Highlights for VisualWorks

  • Security - We are including an implementation of PKCS #8
  • COM - The COM Automation Wizard can now save and restore settings to create a VW COM server image.
  • Font Matching - on Font matching failures, the system will return the best match it can find instead of raising an exception
  • WS* - Further Enhancements to various aspects of our WebServices implementation
  • NetClients - We have implemented support for Digest Authentication and for NTLM Authentication

There are various other improvements and bug fixes; the file fixedARs.txt on the CD includes an exhaustive list

Highlights for ObjectStudio

  • We are providing Early Access for ObjectStudio 8 by request only - please contact James Robertson if you are interested
  • OLE bug fixes and enhancements
  • Database bug fixes and enhancements
  • Both the XML Parser and the Opentalk framework have been synchronized with the VisualWorks implementations

posted by James Robertson

techtips

Skipping in non-positionable streams

May 22, 2006 12:50:18 EDT

The latest version of the ComputingStreams in the public repository adds two new stream classes, CachedReadStream and CachedWriteStream. I'm not proud of the well worn, "cached", name and welcome any suggestions, but that's what it is for now. These streams are the first step in my (hopefully) final battle with stream positioning. If you've ever used streams in anger, you'll probably agree that many problems require yanking the stream position back and forth frantically. That of course clashes quite badly with the fact that some streams are inherently non-positionable.

Streams and Positioning

The canonical example of non-positionable stream in my world is socket stream. Once you write something into it, you can forget about going back and rewriting that length prefix on that message that you have just finished marshaling, because by that time the prefix may well already be at the receiving side. That's an obvious one, but as years of working with various kinds of streams go by you'll find out that even though most streams in VisualWorks subclass from PositionableStream, many are not or, what's worse, are only partially ... sometimes. For example even the socket streams, as all buffered external streams, do support positioning API and if you try, you'll find out that you are able to skip and peek ahead if you don't try to go too far back. So you're all happy that your little algorithm works with all kinds of streams until the point when your deployed algorithm happens to peek over the buffer boundary and blows up unable to skip back. Or how about this bonus question, is write stream on a file positionable or not ? And how about read-write stream on a file ? OK, you might think, external streams are just messy, but internal streams should be clean, right. Hm, so what is the result of this one ?

	(ByteArray new withEncoding: #utf8) readWriteStream
		lineEndCRLF;
		nextPutAll: 'abc';
		cr;
		nextPutAll: 'def';
		position: 5;
		next
I guess the #lineEndCRLF gives it away, but often with a multi-byte encoding like UTF-8 where individual characters may take anywhere from 1 to 6 bytes, your chances to nail the position you want to hit aren't that great. Now, to be fair it seems that historically the stream library just wasn't designed with "stacking" streams on top of each other in mind, so positioning a character stream in terms of the encoded bytes in the specific case of EncodedStream isn't that big a deal in that context. However that just doesn't scale if you are dealing with streams potentially several levels of element translation deep. Imagine for example that you want to take your text, encode it in UTF-8, compress it, encrypt it and append a signature. You really need to position in terms of the elements at the top, not the bottom. Or more precisely, the positioning argument to a message sent to a stream must be expressed in terms of the elements that are written into or read from that particular stream. Otherwise you're forced to make assumptions about the overall composition of streams and completely break the abstraction layers. Just imagine that you are deploying your text processing application somewhere in China and you need to substitute UTF-8 for some Chinese encoding, because UTF-8 is just too in-efficient for that particular language.

Moreover, I'll argue that absolute positions themselves are a wrong abstraction. Streams in the most general sense are infinite, they don't end, they don't start, they just have their current position. Messages like #position, #atEnd, #reset, #contents don't even make sense there. If you rely on these messages in your stream processing you're placing severe restrictions on the kind of streams that you can deal with. And don't think this is just "too abstract". Socket streams, media streams, etc, are very much like that. The algorithm encoding particular frame in the video stream shouldn't need any of those messages above. Similarly asking a socket stream if it's #atEnd is a nonsense as well. It cannot reliably answer false until the connection was properly closed and we can be sure that we've received everything that was sent by the other party. Until that point in time the correct answer to #atEnd is definitely not boolean.

So I hope I've convinced you now that absolute positioning and related messages like #atEnd should be avoided whenever possible. But since we still need to yank the current position of a stream around we obviously need to use at least some sort of relative positioning. That's what the #skip: message is for, as it allows positioning relative to the current position of the stream. Again, with element translation in mind it is important that even the relative positions are expressed in terms of the elements at the stream level to keep proper separation of abstractions. But #skip: definitely doesn't prevent (theoretically) infinite streams on either end.

This is where the CachedStreams come in. They sit on top of an arbitrary stream and support skipping regardless of the capabilities of the stream. The stream will never be asked to move back, only forward. Skipping is performed in terms of the elements that the stream deals with, so if you feed characters into the stream you skip in terms of characters, regardless of the characters being converted into bytes using UTF-8 by some stream below.

As you've probably guessed it's achieved by caching. The stream maintains a fixed size buffer that it uses to cache elements. Originally I've started implementing it as a single read-write stream, but it turned out that cached reads and writes have some conflicting requirements, so I ended up splitting it into a read and write streams. The reasons will hopefully become clearer after a more detailed description of each.

Skipping in read streams.

Let's take the CachedReadStream first. The cache is implemented as a circular buffer used as a FIFO queue for the elements. It has a pointer to the current "top", i.e. pointing at the latest element read in from the underlying stream, and a "position" pointer to the current position within the buffer. Consequently the position can back off from the top one full length of the buffer, but not more. Initially the buffer is empty and only fills up as elements are read from the stream. Once it fills up the "oldest" elements will start "falling through" the bottom of the buffer as new elements are added at the top. Skipping forward is unlimited but the cache has to follow along to the target position leaving necessary number of previous elements behind.

The API is identical to other read streams with the crucial difference of #skip: (and consequently #peek) behavior being predictable and consistent regardless of the type of the underlying stream. This also motivated addition of message #previous, a counterpart of #next, but it's not clear to me how useful it is to read the elements in reverse order. I guess the time will tell how useful will this capability be in practice.

Another important (deliberate) difference is that this stream explicitly translates the EndOfStreamNotification into a hard IncompleteNextCountError. I believe that the whole deal with the notification and returning a nil as the result of #next at the end of the stream is a bad mistake, so this is an attempt to start moving away from this behavior. We'll see if I have to back down on this one. The name EndOfStream for the error would probably be better, however the IncompleteNextCountError has been used for a very similar purpose for a long time so I have doubts about introducing a new exception class for almost the same thing. We'll see.

This would be a good time to show an example but since the API is the same traditional stuff, the best I can offer is this snippet of test code.

	| stream |
	stream := (ByteArray new withEncoding: #utf8) readWriteStream.
	stream nextPutAll: 'abcdefghijklmnopqrstuvw'.
	stream reset.
	stream := stream readCache: 5.
	stream skip: 10.
	stream next: 4.		" => 'klmn' "
	stream skip: -5.		" that wouldn't work with bare encoded stream! "
	stream next: 10.	" => 'jklmnopqrs' "
	stream nextAvailable: 100.	" =>  'tuvw' "
	stream next.		"raises and IncompleteNextCountError"
The implementation also pays special attention to the block based APIs based on #next:into:startingAt:, translating that to as few block based operation on the buffer and on the underlying stream as possible. This allows taking advantage of block based copying primitives and can push through larger quantities of data more efficiently. This is clearly an optimization, but the difference is significant enough in most non-trivial applications.

Skipping in write streams.

The CachedWriteStream is similar in many respects but there are significant differences. Written elements are cached in the buffer and only written down into the underlying stream when they fall through the bottom. This is to allow skipping back and rewriting the contents of the buffer arbitrarily until it gets flushed, which is useful for generating things like length prefixed message formats and such. This is the primary difference between the write and read streams. With read streams the position of the underlying stream is attached to the top of the buffer, but with write streams it is attached to the bottom. Another difference is that skipping forward in a write stream is also limited by the top of the buffer, because that represents the absolute end of the write stream, there are no more elements beyond it.

The write stream also provides reading capability within the confines of the buffer, because it is safe and easy to do and can be useful with some types of algorithms. Note however, that the reading and writing operations share the same position pointer. So a #next moves the position the same way as #nextPut:. So, for example

	(String new writeStream writeCache: 5)
		nextPutAll: 'abcdef';
		skip: -4;
		next: 2;
		nextPutAll: 'EF';
		contents 
yields 'abcdEF'. Otherwise the API is identical to other write streams. However it is important to note that when you're done with the write stream you have to #flush or #close the stream in order to get the buffered elements written into the underlying stream. That applies equally to both internal and external streams. Sending #close to all streams when you're finished with them is a good habit to grow anyway, that's the only way to have your algorithms work properly with both internal and external streams. Also some kinds of encodings (like base-64 for example) also need to be informed to flush at the end to terminate the encoding properly even when working in memory.

The write stream also pays special attention to the block based APIs based on #next:putAll:startingAt:, translating that to as few block calls as possible.

Concluding Remarks

From the 300ft view the cached streams look very much like the buffered streams in the library, the only difference being that the buffer is circular with the cached stream. That however is the crucial difference catering to different purpose. The buffered streams are meant to accumulate small reads and writes to minimize the expense of the system read-write calls. A circular buffer wouldn't work at all for this purpose. Conversely the discontinuous operation of these buffers is the cause of the socket stream blowing up on simple peek when it happens to cross the buffer boundary. You really need a continuous (circular) buffer to support reliable skipping. Different purpose needs different buffering strategy.

The cached streams should work well for any algorithm where you can predict maximum size of the step back that you're going to need. However that is not always a practical assumption. My favorite example for this is BER/DER encoding in ASN.1 where you have deeply nested trees of length prefixed encodings of pretty much arbitrary size. There's an idea I'd like to pursue in this regard inspired by the "marking" capability I noticed in the java.nio.Buffer class when I was looking at the stream hierarchy in J2SE. I envision being able to #mark the current position in the stream to inform it that it needs to start caching, and be able to #reset back to that position. However to be able to handle nested structures like BER encoding conveniently I think I'll need to maintain a stack of markers. Also since the buffer will need to to grow arbitrarily, I plan to use a paged buffer so that I can grow and shrink it efficiently. Anyway I'll see how that's going to pan out. It might make a nice topic for another post.

posted by Martin Kobetic

support

Cincom Smalltalk Support Resolutions

May 04, 2006 11:05:59 EDT

This comes from the Cincom Smalltalk Support team - it's a listing of resolutions that have gone out recently:

The resolutions listed below were developed by Cincom Smalltalk Support and Engineering to solve problems reported by our customers. These resolutions may or may not help solve any specific problem that you might encounter. We strongly advise you to back up your application before applying the suggested fix or work-around in case you may need to restore your application to its previous state. Resolutions can be viewed at http://SupportWeb.cincom.com .

Contents

  • Technical note
  • VisualWorks resolutions and patches
  • ObjectStudio resolution and patch builds

Technical Note -- Cloning a VW Store Repository (Resolution 89189)

Cincom has not officially published a way to clone your repository, and this operation isn't formally supported. But we do have some suggestions about how it can be done.

First, note that there is a currently unsupported package (widely used, though) called the Store Replicator. Many users load this and create replicas of their main repositories. It allows for cross-platform/cross-db replicas.

The other way is to make a backup copy of your repository using the DBA's backup commands for that db, and restore the database into a new server. Make sure that you include the VIEWS as well as the TABLES and INDEXES, and TABLESPACES, etc. To help verify that the process is complete, compare your results to all the database objects listed in the install script (generated by executing: DbRegistry createInstallScript). These should all be present on the new server.

After that, ensure that the sequence generator is working correctly. It should not be reset, since that would result in duplicate key values generated. (There are ways to examine this in SQL.)

Update the name of the repository, the DatabaseIdentifier table, using the new repository name (if you don't want it to clash with the original name, copied over to the cloned db table). This can be done in Smalltalk, or using SQL.

Finally, on Oracle, if you are cloning your db into a different database on the same server, make sure you update the VIEWs because, as copied, they will point to the original database (say, BERN.xxx). The views for the cloned repository should point into its own database (say, ABC.xxx). So, you'll have to update the VIEW by hand. This can be done in Smalltalk, or using SQL.

VisualWorks -- Newest Resolutions

Release Resolution ID Description
7.4 89294 How to delete exactly one package in Store?
7.4 89189 Duplicate of Oracle Store repository does not work. What's wrong?
7.4 89160 How to find which protocol a method belongs when the settings are set to show all methods when no protocol is selected.
7.4 88884 OracleConnection>>quiesce can mix up database because the rollback of the super class method is not perfomed.
7.4 88873 The VM crashes with "Out of memory" when a stack fault occurs (infinite recursion).
7.3.1 88702 WebToolKit does not consider TimeOut set in server console and resolver.
7.3.1 88681 We use WebToolKit with WisualWav and Tiny-Http-Webserver. Now we want to switch to an Apache or BEA-Logic web server.
7.3.1 88661 How to manipulate CrystalReports via COM-Connect.
7.3.1 88544 How to upgrade store and use parallel from both releases?
7.3.1 88541 VisualWorks under Windows 2003 Server, does it work?
7.3.1 88505 How can I get Visual Basic to work with the WizardExample.Apllication COM server example?
7.3.1 88346 Deleting in an InputField copies an empty line into PreviousSelections plus other problems with the CopyBuffer resp. PreviousSelections list.
7.3.1 88063 Ora error 00942: 'table or view not ...' when loading 7.1 application into new 7.3.1 image.
7.3.1 87659 Parcel not marked dirty on method changes.
7.3 88540 Do Toggle buttons exist in VW?
7.3 88474 Lens does not resolve proxy. Oracle error thrown.

VisualWorks -- Newest Patch

Vw 7.3.1 AR 48943: Wrong X2O binding for a complex type with a few choices http://www.cincomsmalltalk.com/CincomSmalltalkWiki/VW+7.3.1+Patches.

VW 7.4 AR 50145: VW Internet Explorer Plugin broken in 7.4 http://www.cincomsmalltalk.com/CincomSmalltalkWiki/VW+7.4+Patches

ObjectStudio(R) -- Newest Resolutions

Release Resolution ID Description
7.4 89294 How to delete exactly one package in Store?
7.4 89189 Duplicate of Oracle Store repository does not work. What's wrong?
7.4 89160 How to find which protocol a method belongs when the settings are set to show all methods when no protocol is selected.
7.4 88884 OracleConnection>>quiesce can mix up database because the rollback of the super class method is not perfomed.
7.4 88873 The VM crashes with "Out of memory" when a stack fault occurs (infinite recursion).
7.3.1 88702 WebToolKit does not consider TimeOut set in server console and resolver.
7.3.1 88681 We use WebToolKit with WisualWav and Tiny-Http-Webserver. Now we want to switch to an Apache or BEA-Logic web server.
7.3.1 88661 How to manipulate CrystalReports via COM-Connect.
7.3.1 88544 How to upgrade store and use parallel from both releases?
7.3.1 88541 VisualWorks under Windows 2003 Server, does it work?
7.3.1 88505 How can I get Visual Basic to work with the WizardExample.Apllication COM server example?
7.3.1 88346 Deleting in an InputField copies an empty line into PreviousSelections plus other problems with the CopyBuffer resp. PreviousSelections list.
7.3.1 88063 Ora error 00942: 'table or view not ...' when loading 7.1 application into new 7.3.1 image.
7.3.1 87659 Parcel not marked dirty on method changes.
7.3 88540 Do Toggle buttons exist in VW?
7.3 88474 Lens does not resolve proxy. Oracle error thrown.

ObjectStudio -- Newest Patch

ObjectStudio Patch builds are available on request. Please contact support at helpna@cincom.com (north America) or eurotsc@cincom.com (Europe).

ObjectStudio 7.1 Patch build os71f032306 contains the following fixes.

82247 Returns the correct table when accessing tables created by someone else when you are the owner of a same named table.

82203 Problem with Databases Controller 'Sending message #fieldNames to nil' when accessing DB2 tables.

82126 Improved word wrapping behavior for workplace object titles.

82120 Fixed debugger when setting RadioButton options to nil.

82101 In certain circumstances, the text editor could delete a file when trying to save it.

82029 Release WindowPen resources properly after drawing bitmap buttons.

82024 Correct a problem with OLE components marshalling strings. The problem would occur when the Smalltalk code defined the interface as VT_BSTR but the interface was acutally (VT_BSTR | VT_BYREF).

81998 Correct errors in the Small Program Generator (spgen.txt).

81991 MS SQLServer, the SQL statement 'Select cnt = @@rowcount' was not working properly.

81907 Keep required support classes in the image when specifying to keep debugger interfaces from the ProgramGenerator.

81863 After fix for FR 34181, the DB2 wrapper did not report the problem correctly when warning message number was +100.

81859 Set initial focus properly (this problem existed only in a 7.0.1 patch release).

81855 Fix a Memory Access Violation with DB2 when loading tables with no $. specifier

81851 Fix problems comparing NAN with other numbers, (NAN = anyNumberOtherThanNAN) is always false now.

81836 Fix a focus problem - a window from another application was being given focus when closing a ModalDialogBox (this problem existed only in a 7.0.1 patch release).

81810 Fix a problem with fonts being incorrectly dislayed as bold when a window was covered and then uncovered.

81796 Sybase MONEY, MONEY4 and DECIMAL types are now retrieved with precision 38 and scale 4 for greater accuracy.

81700 Width and Height were reversed in the inst var portRect when creating Bitmaps.

81699 transparentMask was not being set correctly for instances of TransparentBitmap. See new method TransparentBitmapClass>>newFromBitmap:transparentColor:

81685 Current cursor position (getSelection) was sometimes incorrect for FormString.

81516 The charWidth instance variable of a TabListCtrl is nil until the font has been set, which could sometimes cause a "Sending message #asFloat to nil" error. TabListCtrl>>charWidth now ensures that the font has been set.

81233 lostFocus and performValidationFor: behavior in ModalControllers was made consistent with that in non-modal Controllers.

  • The VM now sends #checkForAnotherPageAcceleratorActivation: to a PropertyPage on an Alt+ combination to implement a check for a mnemonic.
  • To activate a different page in the same PropertySheet, define the PropertyPage's name/title with an '&' before the character to be underlined.

81205 TabListCtrl now increases the rowHeight slightly to provide space for gridlines if the #Lined option is on.

81187 Fixed crash in SocketReadStream>>primUpTo: and possibly incorrect results from InternalStream>>primSkipTo:

81169 Duplicate instance variables were removed and code that modifies literal arrays was rewritten to not do so.

81129 Fixed GlobalDictionary>>commandLineOptionAt: to preserve the case of the command line arguments.

81097 Restored performance of String>>at:put: with very large receivers and indices over 32K (this performance had degraded in 6.9.1) and improved performance of sending #replaceFrom:to:with: to a String, thus improving StreamString.

81083 Copy & paste from Outlook into Workspace or browser in ObjectStudio does not allow re-paste into Outlook.

81061 The Items and Available Methods lists in the Designer's Return Key Method Assignment dialog are now sorted.

81047 Fix debugger in XML file parsing by defining PredefinedEntities.

81028 GlobalDictionary>>locale was treating the registry value HKEY_CURRENT_USER\Control Panel\International\Locale as a String representing a decimal number instead of a hexadecimal number, causing incorrect results.

80880 Created the method CharacterClass>>euro to answer the Euro symbol properly in both classic and Unicode versions of ObjectStudio, and use that method to initialize AvailableCurrency.

80748 Fix load error with CommBuilder.

80730 Avoid TopicBox flickering when running ObjectStudio with a manifest file on a machine using the Windows XP style appearance.

80589 Fix debugger 'Instance method #onReturn is not found in Class FormTreeView'.

80556 Fix TreeViewCtrl>>displaySample to always select the first item in the treeview after setting a new list, to make it visibly apparent when that treeview gets focus.

80495 WorkplaceObjects were not being properly displayed if they were under active child forms.

80157 Implemented FormItem>>handleBeep to avoid debugger from InterfacePart>> invalidDataFor:

79721 Made mouse scrollwheel work properly with TabListCtrl items in Win2000 and greater.

79805 Redesigned bitmap button implementation to properly display disabled state in all cases.

79412 ReadIniFileStreamclass>>file:onError: now answers the result of the error block on error.

79087 Module>>privateLoad:onError: was changed to answer the result of the error block on error.

78813 Wait cursor issues partially fixed, the wait cursor still doesn't display properly when selecting a drop down list box item by mouse.

78145 ControllerItem>>hitKeyKeyName: refactored to ensure that (System showWaitCursor) is always sent, and a VM flag to suspend displaying of the wait cursor is now maintained properly in all cases.

74888 Display of bitmaps from deployment image under Windows 2000 had wrong colors.

74351 String>>replaceExpression:with:options:times: was modified so the argument to with: could be a large as 16K characters instead of only 256 characters.

72767 ObjectStudio will now open a MDI subform maximized if that option is set in the Designer.

34216 Many implementors of privateExecSql: were modified to raise an error if the argument is nil.

32397 Display hourglass while loading an application.

ObjectStudio 7.1 build os701f030906 contains the following fixes. There are known issues with default button resulting in a double event and a GDI memory resource leak.

72767 ObjectStudio will now open a MDI subform maximized if that option is set in the Designer [updated].

74888 Can use a TransparentBitmap for button.

79087 Change Module class>>privateLoad:onError: to answer the result of evaluating the error block instead of the integer error code on error.

79805 Redesigned bitmap button implementation to properly display disabled state in all cases.

80053 OLE_DEMO Error: Avoid error in CustomerServiceController by not attempting to close the MS Word file if we haven't opened it yet.

80555 Further updates to avoid calling the default button action twice.

80706 Do not process mnemonic accelerator events for SubForms that don't have focus.

80880 Created the method CharacterClass>>euro to answer the Euro symbol properly in both classic and Unicode versions of ObjectStudio, and use that method to initialize AvailableCurrency.

81047 Initialize PredefinedEntities for XML Parser.

81233 Multiple related changes, including lostFocus (Form vs. ModalDialogBox), #performValidationFor:, keyboard handling.

81699 transparentMask was not being set correctly for instances of TransparentBitmap. See new method TransparentBitmapClass>>newFromBitmap:transparentColor:

81851 Float>>= was behaving in unpredictable ways when the reciever or argument was NAN. It either is NAN it will now answer true only if both are NAN.

81998 spgen has errors during load.

82029 Correct Bitmap Button regressions, including resource leak.

82210 FormOLEControl items were not getting focus when tabbed into, and interrupting tab traversal sequence.

82240 MDI: maximize option, extra click needed to open login window.

82241 Regression fixed for regular expression.

82469 Horizontal grid lines disappeared when scrolling back up in TabListCtrl.

82505 Alt+ combinations were not opening menus as they should from MDI child windows.

82534 Fixed a crash when an OCX passed VT_EMPTY an argument instead of the defined VT_VARIANT.

We value your feedback. Please feel free to send me your questions or concerns.

Kim Thomas,
Smalltalk Support Manager
kthomas@cincom.com
Cincom Systems, Inc.
55 Merchant Street
Cincinnati, Ohio 45246

posted by James Robertson

techtip

VW 7.4 Spoilers - ASN.1

December 23, 2005 00:54:02 EST

I've mentioned previously that we have spent a lot of time on ASN.1 in this release cycle, so I better say something about it. However this article won't be an introduction to ASN.1, I want to focus on the improvements in our implementation, but there are some easy introductions available and even free books for the gory details.

I figured that the best way to demonstrate the framework is to show how it's used in an application, and our most interesting application so far is the X.509 framework, so let's take a look at that. The X.509 framework is structurally fairly simple, you have a hierarchy of X509Objects representing various components of an X.509 Certificate and few supporting classes like X509Registry or specific exception classes. The job of the ASN.1 framework is to turn X509Object instances into DER encoded bytes and back. To be able to do that the framework needs a structural description of the encoded bytes. It needs to know that an encoded certificate starts with an encoded TBSCertificate, then an identifier of the algorithm used to sign the contents of the TBSCertificate (TBS stands for 'to be signed' here) and finally the bytes of the signature itself. ASN.1 describes this structure using a C-ish notation. The Certificate definition looks as follows:


	Certificate  ::=  SEQUENCE  {
		tbsCertificate       TBSCertificate,
		signatureAlgorithm   AlgorithmIdentifier,
		signatureValue       BIT STRING  }


A SEQUENCE is like a struct in C and the elements show name first and type second. Our framework represents this information with a structure of ASN1.Type objects, closely following the ASN.1 expressions. In this case it would be something like


	(SEQUENCE name: #Certificate)
		addElement: #tbsCertificate type: TBSCertificate;
		addElement: #signatureAlgorithm type: AlgorithmIdentifier;
		addElement: #signatureValue type: BIT_STRING;
		yourself


This expression will not work correctly though, because the #type: arguments would have to be other ASN1.Type objects. If you build the type objects in the right order, making sure all the component types are created before the containing types, you might be able build the structure by hand, however that would be very inconvenient to maintain. It is OK for relatively simple structures, like


"	RSAPublicKey ::= SEQUENCE {
		modulus           INTEGER,  -- n
		publicExponent    INTEGER   -- e }
"
	(SEQUENCE name: #RSAPublicKey)
		addElement: #modulus type: INTEGER;
		addElement: #publicExponent type: INTEGER;
		yourself


For the more complex cases the framework provides a more convenient mechanism, a Module. An ASN.1 Module is a container for a set of related Type definitions and consequently provides a context for lookup of types by name. As soon as you put a SEQUENCE into a Module, you can define its elements using type names instead of full instances and it also takes care of resolving forward references, i.e. you can define types in any order you wish, their mutual references will be resolved properly as the type definitions get added. The X509 framework maintains its module in a shared class variable X509Object.ASN1Module. Using a module the type definition for Certificate can look as follows:


	module := Module new: #X509.
	tCertificate :=
		(module SEQUENCE: #Certificate)
			addElement: #tbsCertificate type: #TBSCertificate;
			addElement: #signatureAlgorithm type: #AlgorithmIdentifier;
			addElement: #signatureValue type: #BIT_STRING;
			yourself


Once we have the type definitions in place the marshaling framework knows enough about the encoded bytes, however in order to be able to map objects to bytes, it needs to know how the types correspond to classes. For most of the "simple" types there's predefined correspondence, i.e. BOOLEAN maps to Booleans, INTEGER to Integers, etc. However SEQUENCE and SET types default to instances of ASN1.Struct which is kind of like a Dictionary but with few convenience gimmicks, like you can use the element names as accessors and such. But we don't want a Struct instance for Certificates, we want instances of Certificate class. That's why the SEQUENCE and SET types have an optional 'mapping' attribute. You can tell it to map to a given Smalltalk class. It is responsibility of the developer to make sure that the class provides all the expected accessor methods. The Certificate class does that of course, so all that needs to be done is to tell the type about it:


	tCertificate mapping: Certificate


The last type feature to discuss is encoding retention. X.509 has a fairly tortured history and the practical outcome of it is the rule to never ever re-encode a certificate. Therefore it is desirable for a certificate imported from outside to retain its DER encoding, in case it needs to be exported again. The retained encoding also serves as a cache, so writing out an object with retained encoding can simply dump the retained bits, instead of going through the encoding process.

Any Type can be told to retain its encoding. An encoding is captured in an instance of ASN1.Encoding pointing to the relevant bytes. The framework will pass the Encoding to the corresponding object using #_encoding:type: message. The default implementation in Object will wrap the object in a TypeWrapper which has a slot to capture the encoding, however it is expected that most applications will simply allocate a slot directly in the objects that will retain encoding and therefore will most likely override the method to store the Encoding there (see Certificate>>#_encoding:type: for example).

Note also that the encoding retention behavior was factored out of the marshaling machinery into a standalone EncodingPolicy object and is therefore completely customizable. An interesting side-effect of this is that with customized EncodingPolicy you get a chance to intervene with the marshaling at interesting points in the process. One possible exploitation of this, that was very handy for us while trying to figure out bugs in the marshaling process was the PrettyPrinter policy which produces a map of the bytes on a text stream while marshaling.

Here's an example. Let's take something simpler than Certificate, for example the Name used for issuer or subject fields of Certificate. The full ASN.1 definition of Name is somewhat complex, but this is the part relevant to our example:


"	Name ::= CHOICE { RDNSequence }
	RDNSequence ::= SEQUENCE OF RelativeDistinguishedName
	RelativeDistinguishedName ::= SET OF AttributeTypeAndValue
	AttributeTypeAndValue ::= SEQUENCE {
		type     AttributeType,
		value    AttributeValue }
	AttributeType ::= OBJECT IDENTIFIER
	AttributeValue ::= ANY DEFINED BY AttributeType
"
	(module CHOICE: #Name)
		addElement: nil type: #RDNSequence;
		retainEncoding: true.
	module SEQUENCE: #RDNSequence OF: #RelativeDistinguishedName.
	module SET: #RelativeDistinguishedName OF: #AttributeTypeAndValue.
	(module SEQUENCE: #AttributeTypeAndValue)
		addElement: #type type: #AttributeType;
		addElement: #value type: #AttributeValue.
	module OBJECT_IDENTIFIER: #AttributeType.
	module ANY: #AttributeValue.


Basically a Name is somewhat nested collection of attributes, where attribute has a type and a value. Now let's unmarshal an encoded Name using the PrettyPrinter policy.


 	bytes :=   16r3068310B3009060355040613025553311330110603550407130A43696E63696E6E61746931173015060
355040A130E43696E636F6D2053797374656D7331193017060355040B131043696E636F6D20536D616C6
C74616C6B3110300E0603550403130754657374204341 asBigEndianByteArray.
	marshaler := DERStream with: bytes.
	output := String new writeStream.
	marshaler encodingPolicy: (PrettyPrinter on: output).
	marshaler reset.
	name := marshaler unmarshalObjectType: module Name.
	output contents


If all goes well the result of the above code should look as follows:


0	Name
0		RDNSequence
2			RelativeDistinguishedName
4				AttributeTypeAndValue
6					AttributeType
11					ObjectIdentifier(2.5.4.6)
11					AttributeValue
15					'US'
15				AttributeTypeAndValue {type ObjectIdentifier(2.5.4.6), value 'US'}
15			OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.6), value 'US'})
15			RelativeDistinguishedName
17				AttributeTypeAndValue
19					AttributeType
24					ObjectIdentifier(2.5.4.7)
24					AttributeValue
36					'Cincinnati'
36				AttributeTypeAndValue {type ObjectIdentifier(2.5.4.7), value 'Cincinnati'}
36			OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.7), value 'Cincinnati'})
36			RelativeDistinguishedName
38				AttributeTypeAndValue
40					AttributeType
45					ObjectIdentifier(2.5.4.10)
45					AttributeValue
61					'Cincom Systems'
61				AttributeTypeAndValue {type ObjectIdentifier(2.5.4.10), value 'Cincom Systems'}
61			OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.10), value 'Cincom Systems'})
61			RelativeDistinguishedName
63				AttributeTypeAndValue
65					AttributeType
70					ObjectIdentifier(2.5.4.11)
70					AttributeValue
88					'Cincom Smalltalk'
88				AttributeTypeAndValue {type ObjectIdentifier(2.5.4.11), value 'Cincom Smalltalk'}
88			OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.11), value 'Cincom Smalltalk'})
88			RelativeDistinguishedName
90				AttributeTypeAndValue
92					AttributeType
97					ObjectIdentifier(2.5.4.3)
97					AttributeValue
106					'Test CA'
106				AttributeTypeAndValue {type ObjectIdentifier(2.5.4.3), value 'Test CA'}
106			OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.3), value 'Test CA'})
106		OrderedCollection (OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.6), value 'US'})
OrderedCollection (AttributeTypeAndValue {type ObjectIdentifier(2.5.4.7), value 'Cincinnati'}) OrderedCollection 
(AttributeTypeAndValue {type ObjectIdentifier(2.5.4.10), value 'Cincom Systems'}) OrderedCollection
(AttributeTypeAndValue {type ObjectIdentifier(2.5.4.11), value 'Cincom Smalltalk'}) OrderedCollection 
(AttributeTypeAndValue {type ObjectIdentifier(2.5.4.3), value 'Test CA'}))
106	Name<RDNSequence:OrderedCollection/>

The numbers at the beginning of each line show offsets into the source bytes. Each unmarshaled entity has two entries in the output, its type at the offset where it starts and the value printString at the offset where it ends. Simple types will have 2 entries next to each other and constructed types will have their elements indented.

There's much more to talk about in ASN.1, things like tagging, sub-typing, constraints, etc. But this post is again getting long, so I'll pick this up some other time. If the release is out before I get to it, you can find more about ASN.1 in the release notes and in the shiny new ASN.1 chapter of the SecurityGuide.pdf. Until then, thanks for reading this far and Happy Holidays!

posted by Martin Kobetic

techtip

VW 7.4 Spoilers - X.509

December 13, 2005 01:00:26 EST

Considerable amount of time in this release cycle was spent on improving our ASN.1 support. I don't want to get into details of this rather technical topic (maybe next time), but I'd like to show how much we've gained from it. As it happens ASN.1 is one of the principal building blocks of many security related standards, for example the PKCS suite of standards, CMS and S/MIME and obviously the ITU-T's X-series of recommendations where ASN.1 itself comes from (X.208). As you've probably guessed the X.509 certificate standard is part of the X-series as well but it is also published as RFC 3280. Anyway, enough with the barrage of acronyms, my point is, it's very useful to have it. So let's get back to X.509 certificates.

VisualWorks has supported X.509 certificates for a while, however until now it was the bare minimum necessary to conduct an SSL handshake. To be fair that's still quite a bit of functionality because you need to be able to decode pretty much any certificate, model all aspects of a certificate, validate all kinds of certificate properties and certificate chains etc. But it's still a far cry from full X.509 support. The most glaring limitation was inability to generate and encode certificates. With the new ASN.1 framework, we were able to rip out pretty much all the special purpose marshaling code from the X.509 framework and in exchange get not only the ability to decode but also to encode certificates. So here's how you can generate a self signed certificate with VW7.4.

An X.509 certificate binds a "name" of a subject to a "public key". The certificate itself is signed by the issuer of the certificate and the "name" of the issuer must also be present on the certificate. So to create the certificate we'll need the name of the subject, the name of the issuer and subject's public key. A self signed certificate is one where the issuer and the subject is the same entity, therefore the subject and issuer names are the same. Self-signed certificates are commonly used for certificate authority (CA) certificates, because they conveniently bundle the autority name with their public key. So let's define the name:

	caName := Security.X509.Name new
				add: 'C' -> 'US';
				add: 'L' -> 'Cincinnati';
				add: 'O' -> 'Cincom Systems';
				add: 'OU' -> 'Cincom Smalltalk';
				add: 'CN'-> 'Test Certificate Authority';
				yourself.

A name in X509 is actually a collection of so called AttributeValueAssertions where the keys represent well known aspects of a name. The various attributes are C=Country, L=Location, O=Organization, OU=Organization Unit, CN=Common Name, etc. Yes, the APIs could be nicer, but we're not there yet and I want to present code that actually works, so please, bear with me.

We'll also need keys. Let's say we want the CA to use RSA keys. Here's how to generate some.

	caKeys := Security.RSAKeyGenerator keySize: 1024.
	caKeys publicKey

Note that key generation is actually a fairly expensive process so depending on your hardware this may take a few seconds. I'm purposely not storing the public key, because the generator will cache both keys until it's flushed. At this point I just wanted to trigger key generation so that we're done with it.

Now we are ready to create the Certificate. There are few more required attributes like the serial number and validity dates, but those are self explanatory.

	ca := Security.X509.Certificate new
			serialNumber: 1000;
			issuer: caName;
			subject: caName;
			notBefore: Date today;
			notAfter: (Date today addDays: 7);
			publicKey: caKeys publicKey;
			forCertificateSigning;
			yourself.

The #forCertificateSigning bit is necessary for a CA certificate, but let's not worry about that now. We're almost done. The last missing bit on our shiny new certificate is the signature. The magic spell for that is #signUsing: which takes and instance of a signing algorithm preinitialized with a key as the argument. Of course the key has to be the private key, but you already knew that, right ?

	ca signUsing: (
		Security.RSA new
			useSHA;
			privateKey: caKeys privateKey;
			yourself).

And there it is. If you ask the certificate for its #printOpenSSLString, you should get something like this:

Certificate:
	Data:
		Version: 3 (0x2)
		Serial Number: 
			03:e8
		Signature Algorithm: sha-1WithRSAEncryption
		Issuer: C=US, L=Cincinnati, O=Cincom Systems, OU=Cincom Smalltalk, CN=Test Certificate Authority
		Validity
			Not Before: Dec 13 00:00:00 2005 GMT
			Not After : Dec 20 00:00:00 2005 GMT
		Subject: C=US, L=Cincinnati, O=Cincom Systems, OU=Cincom Smalltalk, CN=Test Certificate Authority
		Subject Public Key Info:
			Public Key Algorithm: rsaEncryption
			RSA Public Key: (1024 bits)
				Modulus (1024 bit):
					00:e5:4e:70:0d:65:7f:11:98:a3:2c:37:5a:0a:6d:
					ab:8f:28:92:fc:f9:db:f7:9c:1a:fa:01:a5:96:95:
					24:da:1c:ad:6b:18:65:cd:96:66:dd:e3:90:c8:2a:
					f6:62:ba:03:04:ec:ed:e0:db:f6:ab:65:93:84:4c:
					ef:94:12:a2:cb:14:b5:f2:15:c1:cf:37:9f:fb:e4:
					3a:ae:5e:3f:fb:9f:21:71:15:de:b8:20:c8:e8:8d:
					59:28:bf:ae:85:35:1a:9b:81:3f:b3:cc:d5:35:a1:
					da:3f:2e:dc:ca:cb:38:5e:33:a5:98:cf:7d:9f:2e:
					3e:99:ce:22:0f:21:26:24:37
				Exponent: 65537 (0x10001)
		X509v3 extensions:
			KeyUsage: critical
			X509v3 Basic Constraints: critical
			CA:TRUE
	Signature Algorithm: sha-1WithRSAEncryption
		97:dc:d6:dd:6f:d1:ce:08:d4:f8:d6:5f:bd:70:f1:ac:6a:7a:
		96:58:8c:b0:29:db:2b:82:43:6b:a3:f5:72:55:f5:c2:42:80:
		2c:4a:da:99:e0:be:e8:dd:55:df:45:69:8c:64:d8:5d:bf:78:
		42:0b:2a:89:19:3b:ae:b8:fa:db:b5:66:f3:1f:84:2a:e8:ab:
		09:5f:e6:48:03:25:60:98:a4:29:42:a1:6d:1e:69:82:b9:81:
		63:d1:3e:23:74:df:a2:cd:e3:c5:ca:a3:d6:da:1a:67:37:8b:
		50:cf:16:47:e6:17:ae:df:2b:a4:56:6e:06:58:58:c2:b4:24:
		ab:37

This gives you a complete certificate expressed as a Smalltalk object, but how can you share it with the rest of the world? That's where our new encoding capability gets involved.

	marshaler := ASN1.DERStream on: (ByteArray new: 100).
	marshaler marshalObject: ca withType: ca asn1Type.
	marshaler contents

The result of this is a byte array representing the DER encoding of the certificate.

To prove that I'm not making all this up, let's see how OpenSSL likes our new certificate. OpenSSL comes with this handy little utility called, surprisingly, openssl. You can usually find OpenSSL pre-installed on most Unix based platforms and if you're using Windows you should have had cygwin installed on it already. So give the following a try as well.

First we need to save the DER bytes into a file. The certificate actually caches its DER encoding so we don't need to invoke the above again, the following will suffice:

	'TestCA.der' asFilename writeStream
		binary;
		nextPutAll: ca encoding source;
		close

After this you should have a file TestCA.der in your image directory. To run it through OpenSSL, execute the following in your favourite shell:

	openssl x509 -inform DER -in TestCA.der -text

With any luck you should get the following in response.

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 1000 (0x3e8)
        Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=US, L=Cincinnati, O=Cincom Systems, OU=Cincom Smalltalk, CN=Test Certificate Authority
        Validity
            Not Before: Dec 13 05:00:00 2005 GMT
            Not After : Dec 20 05:00:00 2005 GMT
        Subject: C=US, L=Cincinnati, O=Cincom Systems, OU=Cincom Smalltalk, CN=Test Certificate Authority
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
            RSA Public Key: (1024 bit)
                Modulus (1024 bit):
                    00:e5:4e:70:0d:65:7f:11:98:a3:2c:37:5a:0a:6d:
                    ab:8f:28:92:fc:f9:db:f7:9c:1a:fa:01:a5:96:95:
                    24:da:1c:ad:6b:18:65:cd:96:66:dd:e3:90:c8:2a:
                    f6:62:ba:03:04:ec:ed:e0:db:f6:ab:65:93:84:4c:
                    ef:94:12:a2:cb:14:b5:f2:15:c1:cf:37:9f:fb:e4:
                    3a:ae:5e:3f:fb:9f:21:71:15:de:b8:20:c8:e8:8d:
                    59:28:bf:ae:85:35:1a:9b:81:3f:b3:cc:d5:35:a1:
                    da:3f:2e:dc:ca:cb:38:5e:33:a5:98:cf:7d:9f:2e:
                    3e:99:ce:22:0f:21:26:24:37
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Certificate Sign
            X509v3 Basic Constraints: critical
                CA:TRUE
    Signature Algorithm: sha1WithRSAEncryption
        97:dc:d6:dd:6f:d1:ce:08:d4:f8:d6:5f:bd:70:f1:ac:6a:7a:
        96:58:8c:b0:29:db:2b:82:43:6b:a3:f5:72:55:f5:c2:42:80:
        2c:4a:da:99:e0:be:e8:dd:55:df:45:69:8c:64:d8:5d:bf:78:
        42:0b:2a:89:19:3b:ae:b8:fa:db:b5:66:f3:1f:84:2a:e8:ab:
        09:5f:e6:48:03:25:60:98:a4:29:42:a1:6d:1e:69:82:b9:81:
        63:d1:3e:23:74:df:a2:cd:e3:c5:ca:a3:d6:da:1a:67:37:8b:
        50:cf:16:47:e6:17:ae:df:2b:a4:56:6e:06:58:58:c2:b4:24:
        ab:37
-----BEGIN CERTIFICATE-----
MIICjzCCAfigAwIBAgICA+gwDQYJKoZIhvcNAQEFBQAwezELMAkGA1UEBhMCVVMx
EzARBgNVBAcTCkNpbmNpbm5hdGkxFzAVBgNVBAoTDkNpbmNvbSBTeXN0ZW1zMRkw
FwYDVQQLExBDaW5jb20gU21hbGx0YWxrMSMwIQYDVQQDExpUZXN0IENlcnRpZmlj
YXRlIEF1dGhvcml0eTAeFw0wNTEyMTMwNTAwMDBaFw0wNTEyMjAwNTAwMDBaMHsx
CzAJBgNVBAYTAlVTMRMwEQYDVQQHEwpDaW5jaW5uYXRpMRcwFQYDVQQKEw5DaW5j
b20gU3lzdGVtczEZMBcGA1UECxMQQ2luY29tIFNtYWxsdGFsazEjMCEGA1UEAxMa
VGVzdCBDZXJ0aWZpY2F0ZSBBdXRob3JpdHkwgZ0wCwYJKoZIhvcNAQEBA4GNADCB
iQKBgQDlTnANZX8RmKMsN1oKbauPKJL8+dv3nBr6AaWWlSTaHK1rGGXNlmbd45DI
KvZiugME7O3g2/arZZOETO+UEqLLFLXyFcHPN5/75DquXj/7nyFxFd64IMjojVko
v66FNRqbgT+zzNU1odo/LtzKyzheM6WYz32fLj6ZziIPISYkNwIDAQABoyQwIjAP
BgNVHQ8BAf8EBQMDB4QAMA8GA1UdEwEB/wQFMAMBAf8wDQYJKoZIhvcNAQEFBQAD
gYEAl9zW3W/RzgjU+NZfvXDxrGp6lliMsCnbK4JDa6P1clX1wkKALErameC+6N1V
30VpjGTYXb94QgsqiRk7rrj627Vm8x+EKuirCV/mSAMlYJikKUKhbR5pgrmBY9E+
I3Tfos3jxcqj1toaZzeLUM8WR+YXrt8rpFZuBlhYwrQkqzc=
-----END CERTIFICATE-----

So, this is what it takes to generate a certificate in VW7.4. The APIs definitely need more work, but it's fairly useable already. I think that even with a fairly minimalistic UI slapped on top of this it should be possible to run a private PKI hierarchy.

Actually, I guess I should show how to "issue" a certificate. It is nothing more than generating a new certificate for a given subject with subject's public key and signed using issuer's private key. So we need subject name and keys.

	subjectName := Security.X509.Name new add: 'CN' -> 'Test Subject'; yourself.
	subjectKeys := Security.DSAKeyGenerator keySize: 1024.
	subjectKeys publicKey

The certificate is created as previously, note however that it is important to express the usage of the associated key properly. So let's say the keys for this certificates can be used for signing data and not other certificates (you can find more examples in the 'accessing - key usage' protocol on Certificate).

	subject := Security.X509.Certificate new
			serialNumber: 1000;
			issuer: caName;
			subject: subjectName;
			notBefore: Date today;
			notAfter: (Date today addDays: 7);
			publicKey: subjectKeys publicKey;
			forSigning;
			yourself.

And finally the signature.

	subject signUsing: (
		Security.RSA new
			useSHA;
			privateKey: caKeys privateKey;
			yourself).

We'll leave encoding of this new certificate as an exercise for the attentive reader. Thanks for reading this far, I hope you enjoyed the article.

posted by Martin Kobetic