versioning

Grinding versions ever finer

November 30, 2004 6:19:03.020

Version control is becoming a hot topic, it seems: there's been an explosion of alternative version control systems, and a lot of associated discussion in the blog world. Reading about and experimenting with these systems has made me realize how relatively easy we have it in the Smalltalk world - for example, Martin Pool writes:

There are two basic complex merge problems: cherry-picking changes, and history-sensitive merging within an arbitrary graph. I don't think any single existing system (open or closed) can do both correctly, but I think it is possible.
As it happens, Monticello does do both of those correctly, but not because we're better hackers or have more time than the arch or darcs guys; I rather suspect the opposite is true. It's just that working within the restricted domain of Smalltalk code makes all the problems so much easier, because we have so much richer a model of what source code is: everyone else is held back by having to solve every problem in the general case of an arbitrary text file. The hoops this makes you jump through - like the impressive but, in the Smalltalk case, wholly unnecessary Theory of Patches developed for darcs - can be quite astonishing. Unfortunately, the other thing I've come to realize is that Monticello suffers somewhat from the same disease: Monticello's cherry-picking support, for example, is a very neat technical solution, but its complexity is a strong indication that there's something inherently wrong with the model.

The answer, as I see it, is to even more firmly embrace the advantages we have in Smalltalk, like the ability to individually address versions of single methods. This is something that ENVY got right, though ENVY fails utterly when it comes to supporting merging. But I think there's a sweet spot when you combine ENVY-like granularity with Monticello-like distributed ancestry trees, and Colin and I have started playing with some spike code around that idea. More on this as we explore it further.

Comments

[] November 30, 2004 12:02:01.121

How does Monticello handle the versioning of non-code assets, such as Photoshop files, configuration files, database schema and the like? Not everything in a program is code, but it all needs to be in the revision control system.

Non-code assets

[Avi Bryant] November 30, 2004 15:09:39.665

Most of Monticello couldn't care less what kinds of artifacts it is versioning - there's a well defined protocol that a Definition has to comply with, and you can (and people do) extend it with subclasses that model things other than methods, classes, etc. That's assuming your content is inside the Smalltalk image, however, or can be made to look like it is. If you had external files, you'd have to write some code that represented them as a set of objects that could be versioned at some reasonable granularity.

[] December 1, 2004 8:37:43.671

That was my point. For most systems, the code of the program is a relatively small part of the overall assets. A lot of work is being done by non-programmers -- graphic designers, content authors and the like -- and needs to be kept in the revision control system. In a Smalltalk app, that work is not in the Smalltalk image.

In the image?

[Avi Bryant] December 1, 2004 13:59:42.644

In a Smalltalk app, that work is not in the Smalltalk image.

Are you speaking as someone who's written Smalltalk apps? On the Smalltalk apps I've worked on, that work most definitely *has* been in the Smalltalk image (occasionally with the exception of icons and logos and the like, which I haven't found a need for formal configuration management of). Why wouldn't it be? Among other things, it makes deployment much, much simpler.

Cherry-picking

[Ian Bicking] December 1, 2004 18:56:58.084

Reading your description of the cherry-picking in Monticello (and acknowledging that I may misunderstand it), it seems like a non-problem in file-based version control. patch is a kind of lame tool (at least I've never been that comfortable with it), but it sounds like you're talking about selectively applying a patch? Or is this along the lines of the theory of patches, where you are detecting that patch X had been applied in the past, so that the you can find a common ancestor for the contributor (who developed patch X), the current branch (which had the patch applied, then had additional changes), and the current contributors code (which has patch X and their own independent developments). Or, if patch X wasn't applied, it would compose it as two patches? Blah, I don't understand all the issues.

Which is why I suspect Smalltalk isn't as far along as other systems -- it's easy to think you've got it, but people who are thinking very hard about version control haven't gotten it. They don't appear that far ahead, but it's because they are going in circles. And they are going in circles because there's many false leads, and because each real gain is hard won. And it's usually debatable if it's a real gain ;)

To me, the image seems very challenging from a usability point of view. Images are monolithic, while most version control systems work on filesystems which are rather distributed. You can have two branches or checkouts sitting along side each other, and tools can access either just fine, compare, contrast, etc. In Smalltalk that's two discrete systems. Maybe if you could run select code in a specific branch, while the rest of the system remained in a different branch. But that seems infeasible, and even if it wasn't, since it's utterly inconcrete it's also rather opaque.

One of the good things about subversion is that it made all the branching concrete. A branch is a separate directory. In CVS is was an invisible concept; you had to make a separate distinct checkout, and you have to imagine you know where you were. Branching operations were separate from all the others -- not orthogonal to other operations, but also not the same as other operations. The image seems stuck in that; only with the image you can't even make two seperate checkouts, you can't move between worlds. And this seems to be such an issue because the environment and the image and the code you are using are all one and the same; the programming tools don't operate on the image, they are the image.

Of course, it doesn't have to be this way. Smalltalk could go into files just fine, e.g., GNU Smalltalk. But then Smalltalk, like many other technologies (Java, .NET, Zope), isn't just a language. Unfortunately because of that you can't pick out the best parts, or the parts most appropriate for a problem domain, or the parts you understand; you either take it or leave it.

Cherry Picking

[Avi Bryant] December 1, 2004 20:41:33.045

Ian,

Yes, applying a patch is easy, and I don't know of any Smalltalk that doesn't have some simple way to do that (traditionally called "change sets"). Monticello's cherry-picking is more like the "theory of patches" you mention - for example, it can detect conflicts intelligently if I cherry pick some changes from branch A to put them into branch B, and then later on decide to merge B back into A. I don't actually know of any file-based tool that does that, though there's probably one out there somewhere.

Your point about images is interesting, although I've never found it to be much of a problem in practice. Having separate images is very similar to having separate checkouts (though, conveniently, it's a separate checkout of *everything* you need for the app, including all the external libraries you use etc). How often do you run scripts that compare two separate checkouts? I always do that kinda thing through the version control system rather than by directly comparing two directories on my local filesystem. Your focus on "concreteness" hits home, for me: the image ends up being an extremely concrete representation of a particular branch or a particular project. I completely agree with your assessment of branching in CVS (it still makes me feel uneasy and somewhat lost), but I have none of the same feelings using Monticello and Squeak. This may be due to the relative ease with which I can create a whole new world (without any version control at all, you can "branch" in Smalltalk by doing a Save As... of your image). Creating a new world in most languages (say, to try out python 2.4) is a whole lot more heavyweight.

Anyway, thanks for the thoughtful comment. It's very clear that The Image is a big barrier (conceptual, social, technical) between Smalltalk and many potential users, and it's always interesting to explore why. If you have any more thoughts on use cases where you think image-based development would get in your way, I'd be interested to hear them.

Multiple repositories

[Ian Bicking] December 2, 2004 17:56:29.674

I generally work with several repositories simultaneously. Sometimes for good reasons, sometimes not. Developing open source, I'm often using other people's software, frequently from a checkout rather than a formally released version. Or similarly, if I'm developing something to be used by other people, I will recombine and test different versions -- e.g., run something under python2.2 and then python2.4, or running it against the last release of a package and a current checkout. And come to think of it, there's often more than one person editing the same checkout; e.g., someone working on templates while I'm working on code. So the sharing gets all mixed up, it's a many-to-many relationship between people and checkouts and repositories, and all for good reasons.

Getting back to multiple repositories, I'm also not doing monolithic branching; rather I'm branching individual pieces, and recombining them. This isn't trivial or entirely robust -- you have to mess with the lookup paths for modules and otherwise control the runtime environment. But it's not hard either. And the whole thing can be automated easily, through a shell script, through configuration, or through introspection. As I think about it, this might relate just as much to the IDE; my IDE is the command line and Emacs. Maybe to someone using a more Smalltalk-like IDE the contrast wouldn't be as great, and the idea of clicking around to do things wouldn't seem so crude. Of course, in the always-heterogeneous environment of open source, IDEs don't get far if they don't speak the lowest common denominator -- ASCII source, without annotation. Actually, IDEs don't get far at all, no matter what.

It's been a several years since I've used Smalltalk or kept up with the community, but I always noticed that the sharing was not nearly as granular as other environments, and the open source development always seemed sparse. That never really made sense to me -- there's lots of talented programmers who want to share what they are doing and are excited about their projects, and the environment is pleasant and rewards self-motivated programmers. But it's hard to share a small piece of code with other people -- at least it was in the environments I used -- and it felt hard to cooperate intimately on limited-scope projects. I'm sure that's not the only reason, and maybe things have gotten better since then, but something about Smalltalk has always seemed to reflect the culture out of which is arose -- a bunch of smart people living and working closely together, dedicated to a common (if sometimes ambiguous) goal. The values are somewhat similar to the open source community, but the process is very different, and those differences go deep.

Hmm... I used the term crude up there, without thinking about it; a kind of freudian slip. I suspect GUI users (i.e., Smalltalkers) feel the same about the command line and our quaint ASCII files. An interesting symmetry behind the divide. And thinking about the difference in origins, I think there's a tendency for us to talk past each other -- we confuse our shared values with a shared foundation, and we don't actually have that. That's why many of the people who critique Smalltalk think the responses miss the point -- there's always an underlying unspoken desire, and we don't appreciate that our desires are very different. Of course, understanding this doesn't necessarily bring us closer if our differences are truly fundamental.

[] January 5, 2005 12:57:56.311

"On the Smalltalk apps I've worked on, that work most definitely *has* been in the Smalltalk image (occasionally with the exception of icons and logos and the like, which I haven't found a need for formal configuration management of). Why wouldn't it be? Among other things, it makes deployment much, much simpler."

Ok, some examples from the non-Smalltalk world. How would these work in Smalltalk?

1) A racing simulation game. This had a huge graphical aspect, created by a graphic designer. Most of the application source tree was photoshop files. The game loaded simulation data dynamically. Simulation data was created by a domain expert in a spreadsheet. All of the graphics and simulation data had to be versioned. The build process generated the actual graphical resources from the Photoshop files and linked them into the final executable.

2) A business appliaction that interfaces to, and includes changes to, a huge legacy database. Many other applications, written in a wide variety of languages, also use the database. Scripts to change the database must be stored in version control, so that they can be used to create development environments matching different database and application versions.

How would these situation be handled in a version control system that only works with an image (the equivalent of the linked, running executable)? In the first example, by the time that the images are in the executable they are no longer editable and so versioning is of no use. In the second example, the database definition and change scripts need to be shared among different applications and development teams.