Scripting flamewar

I made some comments recently about scripting languages that generated flame storms in a variety of places. For example, JDJ and Artima. Yes, I did say those things. But there's a lot of context missing, and they're the flippant soundbite version of what should have been a long and careful explanation that could easily mushroom into a series of PhD theses. Amongst all the flamage there are all kinds of generalizations about "scripting languages" versus "compiled languages". My big problem with a lot of it is simply that these two polarizing categories are a pretty poor way of capturing the distinctions between language designs. The terms are almost as goofy as "Republican" versus "Democrat". Taking huge multi-dimensional spaces of choices on different issues, then combining and simplifying them all down to a brutally simple binary choice is goofy.

Over the years I've built quite a lot of scripting systems. I've also built a number of compilers for non-scripting languages. Given enough beer I'll even admit to having implemented a Cobol compiler for money in the deep dark past. But I've done more scripting systems than non-scripting systems.

There are issues in the tradeoffs between the two linguistic spaces all over the place. Someday I'd like to write a long tour through them, but there just aren't enough hours in a day. I have a hard time getting enough time to do even trivial blogging: being truly thoughtful takes a lot of time. But I'll try to cover a few.... For now, I'll make the generalization that "scripting language" means one that is interpreted with dynamic runtime typing, and the other camp is languages that are compiled to machine code and have static runtime typing. This is a broad over-simplifying generalization, but it matches pretty well what goes on in common conversations.

Raw execution performance: One of the usual arguments for scripting languages having acceptable performance is that the overhead of interpretation and dynamic typing doesn't matter. The performance of the system is dominated by other factors: typically IO and the language primitives. For example, PERL apps usually spend the majority of their time in file IO and string primitives. I've strongly made this argument in the past, and it's quite valid. But having observed developers usage patterns, the two most common things that happen to erode the argument are:

  1. Developers start doing things that are outside of what the language primitives are good at. For example, PostScript has great primitives for rendering. So long as you're doing rendering, it flies like the wind. But then someone goes and writes a game that's heavily based on rendering, and a piece of it needs to do collision detection between missiles and targets. Physics in PostScript: a bad idea.
  2. Developers start clamoring for new primitives. Some are too specialized to be reasonable "I want a fast collision engine", some are rational "object oriented programming has become the dominant style in PostScript, but the OO model is implemented in PostScript as a library and is slow".
There are a bunch of ways to respond:
  1. "Buzz off". My personal favorite. Don't use a hammer to tighten a bolt.
  2. Start adding more primitives. This is hopeless: you drown in wave after wave of requests.
  3. Put in a facility for developers to write their own primitives. eg. link hunks of C code into the interpreter. Can work pretty well, but the linguistic universe schism can be problematic. Not only do developers then need to learn two environments, but because of the cross-language calls each language can impose difficulties on the other (eg. doing a language with a good garbage collector has a hard time interacting well with C's malloc/free regime.
  4. Make the "scripting language" fast enough to implement primitives. It tends to end up not looking much like a scripting language. This is roughly the road I went down with Java: to see how close to a scripting experience I could get while being able compile obvious statements like "a=b+c" into one instruction in the common case. I could have gone down the road of making declarations optional, but I intentionally didn't.
Worrying about scale, evolution and testing: strong compile-time types dramatically improve the ability of the tools (compiler, IDE, ...) to do global checking and manipulation. Some people like this. Some don't. There's a fair amount of evidence that these sorts of facilities are really helpful in dealing with systems that get large, evolve over a long time or require large teams. The most-often-cited reason for preferring dynamic typing is that it can make the development process easier: no need to bother typing all those yucky declarations.

There are a variety of middle-ground solutions:

  • One is to allow, but not require, declarations. This is particularly common in Lisp dialects where you can sprinkle declarations like (fixnum i j k) where needed to cause the code generators to generate efficient code. The big problem is figuring out the "where needed" bit. Sometimes it can be dicey to figure out all the places where there are issues. Often there are no real performance hotspots: no loops burning up all the time. Rather, it's a "peanut butter" problem where it's spread everywhere: you have to tune everything to make a difference.
  • Another is to make the language be strongly/statically typed, but to make the declaration process more transparent. One extreme point in this space is ML which does extensive type inferencing (leading sometimes to developer confusion because they can't figure out what the type inferencing engine is doing). An operator that I've long wanted to add to Java is "declare and assign":
    v:=e Declares v to have the same type as e, and assigns e to v. "x:=r*sin(t);" would be roughly equivalent to "double x = r*sin(t)".
    v:==e Just like :=, except that it makes v final

    There's a lot more to say, but this is enough for today...

March 23, 2006