development

The "Excitement" of LINQ

February 28, 2006 9:19:15.330

I got my hardcopy of SDTimes a few days ago, but have been waiting for it to go online before writing about the special report on the CLR. The article is written by Larry O'Brien, and I think he's a bit too breathless in his praise for it. Some of the inconsistency starts early. He makes a point about how most developers prefer type systems like what C, C++, and Java have - which, based on practice in the industry, is certainly true:

Certainly, the programming community has voted repeatedly to embrace languages that derive from C. This preference is a combination of an appealing syntax (explicit typing is appealing for programming in the large), professionalism by association (real programmers use curly braces) and performance (all popular C-derived languages have rejected “everything is an object” for at least some primitive types).

While I'd call those flaws, most people in the industry don't agree with me. So why do I say that the inconsistency starts early? Well, Larry is writing about LINQ (Language INtegrated Query) like it's the second coming for software developers. Here, let me quote him:

To understand the power of LINQ and how it seems foreshadowed in the evolution of C#, forgive a quick foray into code. Consider Listing 1 (page 29), which gives a hint of a LINQ-like function called “FindAll.” After the initial “using” statements (which make available important classes), we define a new type of function called a Predicate. Predicates are functions that take a value and do some calculation that results in a Boolean evaluation. In C#, function signatures such as this are first-class language features called “delegates.” These have been a feature of C# since the beginning. Our Predicate function, though, works on any type; we don’t need a separate definition for functions that evaluate integers, or strings or customer records. This “generic” functionality was added to C# in the 2.0 version.

If you take a look at FindAll, you see that, for convenience, it’s “static” (using it does not require an instance of type “Program” to have been instantiated) and publicly visible. Ignoring the parameterized type “T”s that sprinkle its signature, it should be clear that FindAll outputs a List after taking as input a List and a Predicate. The passed-in Predicate evaluates each of the values in the passed-in list to build the returned list (the predicate is applied with the call to function(value)). Note how generic it is: It makes no assumptions on the type of values it operates on and no assumptions on the workings of the supplied Predicate.

After telling us how valuable explicit typing is, the first thing you notice is that the example is using generics to avoid explicit typing. Kind of makes you wonder how valuable it is, if the real power is available only if you chuck the concept. Here's the code listing he refers to, and that's where I really wanted to make a point:


using System;
using System.Collections.Generic;

delegate bool Predicate<T>(T value);

class Program
{
	static public List<T>FindAll<T>(List<T> inList, Predicate<T>predicate)
	{
		List<T> retval = new List<T>();
		foreach (T value in inList)
		{
			if (predicate(value))
			{
				retval.Add(value);
			}
		}
		return retval;
	}

	static void Main(string[] args)
	{
		List<int> intList = new List<int>();
		for (int i = 0; i < 10; i++)
		{
			intList.Add(i);
		}

		for (int modulus = 2; modulus < 5; modulus++)
		{
			// Notice how a new function is defined here
			List<int>evenList = FindAll<int>(intList, delegate(int i)
			{
				return i % modulus == 0;
			});
			Console.Write("The even multiples of " 
					+ modulus + " in range[1..10] are = ");
			foreach (int i in evenList)
			{
				Console.write(i + " ");
			}
			Console.WriteLine();
		}
		Console.In.ReadLine();
	}
}

What that does is take a list of numbers from 1 to 10, then iterate over them, pulling out first the numbers that are 0 modulo 2, 3, and 4. The above is supposed to be a model of simplicity, showing the power of LINQ in the CLR. Here's Larry again:

By combining delegates, generics and the closurelike "anonymous delegates with outer variable capture" you can create very concise expressions for working with collections. Of course, there is much more to LINQ. A complete SQL-like query ability is much more complex than what is shown here. The structural issues of selection and extension go beyond C# 2.0’s capabilities. And ultimately there’s a qualitative difference between a querying API and query built into the syntax of the language. Nevertheless, there’s a distinct feeling of design decisions made years ago contributing to a capability only now being previewed.

Well. The problem is, this - they had to build it into the CLR with a bunch of syntax. The equivalent Smalltalk?


| list |
list := #(1 2 3 4 5 6 7 8 9 10).
2 to: 4 do: [:index | | answers |
	answers := list select: [:each | (each \\ index) = 0].
	Transcript show: 'values equal to 0 modulo ', 
				index printString, ' ', 
				answers printString.
	Transcript cr]

Gee, that's simpler, isn't it? And unlike the CLR, Smalltalk didn't need to bake that query into the syntax - it's just a library message that any collection understands. Meaning, it's generic without all that extraneous syntax. Microsoft, like Sun, is grasping weakly at the power that Smalltalk already has via added complexity. Apparently, the "value" of explicit typing is such that you need to saddle developers with a pile of syntax in order to deliver "power".

I don't really need to go to the trouble of defining the generic function up front, because #select: is already in the library. But wait - there's more! Say I wanted to add a new query "function" to all collections? Easy - I find class Collection in the system, and I add the method. Bam - there it is. I can make sure to version it off in my own package so that only people who need it will have it loaded as well.

The thing I find interesting is the Java/CLR approach to power - it always involves adding a pile of new syntax to the language - which is why the books defining those systems keep getting thicker. You need a scorecard just to keep up with the syntax.

At the end of an associated column on Iron Python, column, Larry brings up something I addressed the other day:

#Smalltalk (pronounced Sharp Smalltalk, and available from www.refactory.com) seems to be close to a full implementation of the Smalltalk language but does not have the workspace/browser environment that many consider the heart of Smalltalk’s power. Similarly, while there are a few Lisp-like languages for .NET, there’s not a CLOS environment for the CLI. Whether this is because the CLI erects technical roadblocks of progress or because there’s insufficient motivation for commercial or open-source development of such environments, it’s regrettable and only serves to further the dominance of C# on the platform.

As I stated the other day, we actually looked at the CLR as a host for Smalltalk. It simply wasn't suitable, given the level of investment we would have had to make.

Comments

Voted?

[Patrick Logan] February 28, 2006 9:49:00.000

Instead of "the programming community has voted repeatedly to embrace languages that derive from C"...

Perhaps this should be "blindly stumbled into". The industry's preferences are hardly an informed choice of well-explained alternatives. Where we are is more based on history (the computing power available to most people historically as opposed to what is available to them today) and sociology and psychology ("most people would sooner die than think. In fact most do." -Bertrand Russell).

 

sigh...

[Bill Mill] February 28, 2006 10:11:05.000

And people *intentionally* write code in a language that looks like that. How you can write C# or Java generics code and sleep well at night is beyond me.

Crystal Spheres

[Ryan] February 28, 2006 10:58:44.000

My notes from last September's PDC are littered with comments such as: "Just like passing a block in Ruby, only harder", "This is new?!  Lisp has had this for 40 years!", and "Combining the clean syntax of C++ with the elegance of Java".  What really struck me were all the lengths MS was going to in order to preserve explicit typing, and how many hoops they were jumping through in order to avoid making everything an object in .NET.  (There are performance reasons for that, but they seem more like burdens most of the time.)

oh...

[Ryan] February 28, 2006 11:07:47.000

The "Crystal Spheres" title was allusion to the Greeks' crystal spheres theory in early astronomy, and was meant as an analogy to the ways that many languages add features through complexity instead of changing to simpler, more powerful paradigms.  I got distracted and forgot to make that clear in my original comment.

explicit vs implicit : static vs dynamic

[Isaac Gouy] February 28, 2006 13:56:31.000

Larry O'Brien wrote 'much sound and fury about explicit versus implicit typing (often wrongly labeled "static versus dynamic" typing)'
wrongly labeled? Isn't explicit vs implicit orthogonal to static vs dynamic?

As long as we're comparing apples and oranges...

[] February 28, 2006 14:11:20.000

#include

int main(int c, char* v[])
{
for(int m=2;m<5;++m){
std::cout << "\nvalues equal to 0 modulo " << m << ": ";
for(int i=1; i<11; ++i)
if(not (i%m)) std::cout << i << " ";
}
}

The equivalent Smalltalk? - Not

[Isaac Gouy] February 28, 2006 14:21:42.000

James wrote "Smalltalk ... it's generic without all that extraneous syntax"
Smalltalk dynamic type-checking is not equivalent to static type-checking with generics.

You made the same incorrect claim a few months ago. Static type-checking (with or without generics) will detect a missing method such-as peekForInstVarWrite.

Isaac...

[ James Robertson] February 28, 2006 14:36:37.132

Comment by James Robertson

Detection of a missing method is utterly irrelevant. Why? Because such a flaw would turn up in initial testing. If you think explicit typing spares you from the need to test, please, don't develop anything mission critical. As to the "apples/oranges" guy - the post pointed out a use of C# based generics and LINQ. Your example executes the same end result, but does not illustrate what O'Brien was actually talking about. meaning, you brought the disparate fruit to the party. The Smalltalk equivalent of that is what I posted - simple closure usage. Never mind Isaac's pedantic desire to hang on to useless type statements.

turn up in initial testing - Not

[Isaac Gouy] February 28, 2006 14:56:27.761

James wrote "such a flaw would turn up in initial testing..."
It turned up in your shipped product: VW 7.4 (See "Lens broken" on vwnc@cs.uiuc.edu)

Lots wrong here...

[Ian Griffiths] February 28, 2006 15:08:28.494

To be fair, there are a lot of mistakes in Larry's article too, so it's not all your fault. (Equating function pointers with LISP s-expressions being amongst the milder of the flaws, astonishingly enough!) As expositions of LINQ go, he made it pretty easy to misunderstand. But you made some mistakes of your own.

Have you done any research beyond reading Larry's article? I'm afraid a lot of what you've inferred from his article isn't really correct.

For example, you wrote "The above is supposed to be a model of simplicity, showing the power of LINQ in the CLR." Actually, no it isn't. It's a C# 2.0 example that doesn't use LINQ at all. Larry described this example as hinting (in his opinion) at what was to come with LINQ. It's very much not an example of LINQ.

You also wrote: "they had to build it into the CLR with a bunch of syntax". I'm afraid that doesn't even make sense - it's not only untrue it's not even possible. The CLR doesn't have any syntax. So you can't build anything into the CLR with syntax. The CLR is language-independent, hence 'Common Language Runtime', so it doesn't have an intrinsic syntax. It's just a runtime (hence 'Common Language Runtime') designed to be used from any number of different languages. To talk of the CLR having syntax seems to show a fundamental misunderstanding of what the CLR really is.

What you're actually looking at there is C# source code that implements a search feature. As far as I can tell, Larry is posting a sample implementation of the kind of library function that he believes LINQ will provide. If you've understood what the code does then there's no reasonable way you can interpret it as being a bunch of syntax specific to implementing a search feature. It's an implementation of one search feature, and it uses nothing but general purpose C# 2.0 syntax to implement that feature.

You also wrote: "And unlike the CLR, Smalltalk didn't need to bake that query into the syntax " Here I'm left wondering what on earth was going through your head when you wrote that. Clearly you understood the C# example well enough to know what it was doing because you wrote a Smalltalk equivalent. So I'm genuinely perplexed that you thought that this was an example of a query being baked in. It's true that there are proposed C# 3.0 language features that involve baking in some query features, but this isn't it.

Moreover, the baking on of query-oriented language features in C# 3.0 doesn't involve embedding any query behaviour into the language as such. It's just a mapping of certain syntax onto method calls that match a specific name. So when you say this: "Smalltalk didn't need to bake that query into the syntax - it's just a library message that any collection understands" you've summed up exactly the thinking behind LINQ. It defines a bunch of standard methods (or messages, as I gather you like to call them in the Smalltalk world) that all collections will understand. LINQ is all about the library design. C# 3.0 happens to have proposed some new language features that know all about these standard methods, but that's not part of LINQ; it's a language feature designed to exploit LINQ. (And I sincerely hope that this particular language feature doesn't make it in. It's the least interesting and the least flexible of the proposed new C# 3.0 features, and it's completely unnecessary for using LINQ.)

If you were looking at an actual LINQ example, it would have looked simpler because the FindAll function wouldn't need to have been written - you'd just use the Where method provided by the library. Depending on which collection class you were using, you'd either end up with a collection-specific Where, or the generic Where that's able to operate on any collection. The binding to this Where method occurs using completely ordinary method binding mechanisms - .NET languages won't need any special new features to make use of LINQ.

And if it were a C# 3.0 example, you could get it a lot less verbose. The collection initialization syntax would remove the need for the first loop - you could use something very similar to your Smalltalk example. And the core of the code - the part that generates a list of items with a value 0 modulo whatever looks like this:

var answers = intList.Where(x => (x % mod) == 0);

This is using the LINQ library function 'Where'. I'm also using a couple of new C# 3.0 language features here - implicitly-but-statically-typed variables (var) and lambdas. The C# 2.0 equivalent (still using LINQ - like I said you don't need new language features to use it) looks a bit more verbose:

List answers = intList.Where(delegate (int x) { return (x % mod) == 0; });

The key point you've got wrong here is that LINQ is not about new syntax, it's about new library features.

Another mistake (one Larry also sort of makes) is that you conflate explicit typing with static typing. The two are very different. Try using ML for a while - it's entirely statically typed, and yet uses very little explicit typing. It's all about type inference.

C# is moving away from explicit typing, but remaining very much statically typed. Look at features like var and anonymous types proposed for C# 3.0. The fact that the C# 2.0 era example illustrates the use of generics to avoid explicit typing underlines this: the value is not in the explicit typing, it's in the static typing. This use of generics enables you to retain the benefits of static typing without the burden of explicit typing.

It seems to be a common reaction of dynamic language advocates to look at the explicit typing present in many popular statically typed languages and say "Why on earth would you want to do all that extra work?" This misses the fact that the extra typing (in the keyboard sense) isn't the point. It's the cost these particular languages impose for getting static typing. But what language researchers have known for a very long time is you don't actually need to impose that cost - it's perfectly possible to have the static typing without explicit typing in the vast majority of cases. C# is gradually embracing that. C# made some very tentative steps in that direction. The proposed direction for C# 3.0 moves much more strongly in that direction.

use of generics to avoid explicit typing - Not

[Isaac Gouy] February 28, 2006 16:33:46.262

James wrote "using generics to avoid explicit typing. Kind of makes you wonder how valuable it is..."
The example is not "using generics to avoid explicit typing"!
The example shows explicit generic types:

List[T] 
T value 
List[Int]
The example is using generics to avoid having to define separate FindAll methods for a list of integers, a list of strings, a list of chars...
In a statically-checked language with type inference (implicit types) the method might look like
FindAll(list,predicate){ ... }

Smalltalkers sneer at my LINQ-thusiasm

[Knowing.NET] February 28, 2006 20:25:14.328

Trackback from Knowing.NET Smalltalkers sneer at my LINQ-thusiasm

Follow the link for the reference

...

Fun

[Isaac Gouy] February 28, 2006 21:45:38.309

For the fun of it, here's a statically type-checked program which uses type inference and several generic library functions. For example, in the generic library function filter parameter a could be any type

filter :: (a -> .Bool) !.u:[a] -> .[a]

The program
module modulo
import StdEnv

Start = foldl toString` "" [ filter (pred m) [1..10] \\ m <- [2,3,4] ]
   where
   pred k i = i rem k == 0 
   toString` s a = s +++ "Multiples: " +++ (foldl toString`` "" a) +++ "\n"
   toString`` s j = s +++ toString j +++ " "

The program output
Multiples: 2 4 6 8 10 
Multiples: 3 6 9 
Multiples: 4 8 

C#, Isaac, C#

[Mike Anderson] March 1, 2006 5:46:19.873

Isaac wrote:

James wrote "using generics to avoid explicit typing. Kind of makes you wonder how valuable it is..."
The example is not "using generics to avoid explicit typing"!

Are you saying that you think that C#'s type checking is valuable?

If James had written "using generics to avoid the constraints imposed by explicit typing in C#", it might have been clearer.

Obviously, if the original post about LINQ was talking about Haskell and not C#, then James would have had less to say, because the example program would have been as short as yours (interestingly enough, it is pretty much the same length as his Smalltalk example).

Are you saying that...

[Isaac Gouy] March 1, 2006 12:37:06.038

There's nothing novel about know your enemy - incorrect statements just demonstrate what we don't know about generics/LINQ/C#, they don't undermine generics/LINQ/C# or promote Smalltalk.

"using generics to avoid the constraints imposed by explicit typing in C#", it might have been clearer
afaict Larry O'Brien and James were both talking about the FindAll example, where generics are used to provide polymorphism. afaict there are still explicit type declarations ( List{T}, Predicate{T} ) - so I don't understand how we can correctly claim that they are "using generics to avoid the constraints imposed by explicit typing".

"Are you saying that you think that C#'s type checking is valuable?"
Under some interpretation of valuable - yes of course. The question that interests is: What will X type checking provide for me that Y type checking doesn't and vice versa? (And at what cost?)

the same length as

[Isaac Gouy] March 1, 2006 13:07:06.270

interestingly enough, it is pretty much the same length as his Smalltalk example
Let's simplify both:

#(2 3 4) collect: [:m | 
#(1 2 3 4 5 6 7 8 9 10) select: [:i | i \\ m = 0]]
[filter (\i = i rem m == 0) [1..10] \\ m <- [2..4]]
Is the difference between these code fragments something we'd call explicit typing?
I don't understand how we could tell (without further information) if one or both or neither code fragments were from a language with static type checking or dynamic type checking.

C#, Mike, C#

[Isaac Gouy] March 1, 2006 15:23:10.535

Ian Griffiths tried to explain that the example in the article did not show LINQ & C# - I don't have the LINQ PDC 2005 Tech Preview installed so I'm making this up from some code examples (and I'm probably wrong). Maybe the array literals could be used directly with .Select and .Where ...

   int[] modulo = { 2, 3, 4 };    int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };     var result =        modulo.Select(m =>          numbers.Where(i => (i % m) == 0) );  

try try again

[Isaac Gouy] March 1, 2006 15:35:13.620

int[] modulo = { 2, 3, 4 };
int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var result =
   modulo.Select(m =>
      numbers.Where(i => (i % m) == 0) ); 

Yes, C#

[Mike Anderson] March 1, 2006 17:02:45.507


My first attempt got eaten. Apologies if this is a dupe.

James may have been mistaken to be talking about C#, but he was, nevertheless. His comments are in that context, and so should yours be.

You've misquoted me:
"I don't understand how we can correctly claim that they are "using generics to avoid the constraints imposed by explicit typing"."

The end of my sentence was "... in C#". Still, my statement was wrong, because, generic types are explicit types. I was actually thinking about was the burden of explicit typing in the absence of type inferencing.

ListevenList = FindAll(intList, delegate(int i) ...

I mean, come on, there's no need for that.

BTW, it's ironic that, in a discussion about the cost of explicit typing, your example (in Clean?) should contain not only a toString`, but a toString`` as well.

In Haskell...

[] March 1, 2006 17:30:26.614

main = mapM (putStrLn . ("Multiples: "++) . show) [[x | x <- [1..10], x `mod` m == 0]| m <- [2..4]]

C# using LINQ

[Isaac Gouy] March 1, 2006 18:51:42.107

(Yeah, during Preview that JavaScript editor reappears and eats tags)

"my statement was wrong, because, generic types are explicit types"
Exactly

"your example (in Clean?)"
toString` and toString`` were just my laziness in thinking of function names. I might as well have named them printString :-)
Happily I discovered my ignorance of how to write anonymous functions in Clean, so I learned something with that little example.

"I mean, come on, there's no need for that."
As Ian Griffiths tried to explain, the examples in the article really aren't using C# LINQ. afaict the simplified example would be something like this (and maybe the literal arrays could be used directly with .Select and .Where)

int[] modulo = { 2, 3, 4 };
int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

var result =
   modulo.Select(m =>
      numbers.Where(i => (i % m) == 0) );

Smalltalk and C# 3 code snippets

[Isaac Gouy] March 1, 2006 21:14:14.882

After installing C# LINQ Tech Preview Update for Visual Studio 2005 RTM Release I've compiled and run the C# program, and can confirm that it works as expected.

#(2 3 4) collect: [:m | 
   #(1 2 3 4 5 6 7 8 9 10) select: [:i |  i \\ m = 0]]
C# 3.0
new int[]{2,3,4}.Select(m =>
   new int[]{1,2,3,4,5,6,7,8,9,10}.Where(i => (i % m) == 0) );