Is it ever too early to optimize?

One of the general principles that is commonly followed is avoiding premature optimization. In general, I agree with this, although I find it personally difficult: I learned to progam on a machine with only 4K of RAM. These days, with RAM at $200/Gigabyte and multi-GHz machines, we have the luxury of giving priority to issues like cleanliness, modularity and development time. On the whole, this is wonderful. But all too often I see this taken to an unhealthy extreme and slow bloated software results.

In the "no premature optimization" model of the world, you ignore performance and build your system as cleanly as possible. Then you measure and tune. But after the tuning is done you can still find that the system is slow, but there's no real performance problem that you can localize. It's just a generalized, diffused slowness. A lot of small things spread everywhere that add up. One of the common categories is things that lead to a lot of wasted memory traffic. For example, if you're doing a lot of text generation it's often easy to write code that uses string concatenation a lot. I've seen code that uses String+char= all over the place. This does a huge amount of copying and is much less efficient than using StringBuffer. If you're doing heavy text generation, consider designing APIs that append to StringBuffer parameters, rather than returning Strings.

Another thorny area is data structure design. Whenever possible, systems should be designed so that as much as possible of the data structures is encapsulated. A great example of this is the container classes. There are many different kinds of containers with different performance characteristics but which all implement the Map interface. If your application uses the Map interface to declare all containers, then the constructor invocations can be changed the accomodate different performance demands.

But this breaks down in applications where there is some data structure that pervades everything. For example, if you were building an editor for 3D models, almost all parts of the application interact with the model data structure. You can do a lot to hide the details, but there's only so much you can hide. If you don't think hard about the data structure right at the beginning of building the application, you're likely to end up with big problems. In this situation, a "clean" design might involve representing points as objects, and meshes as arrays of points. But when the meshes get very large, this breaks down. A better performing representation would be one or more arrays of floating point values and no explicit point object.

It's all a matter of balance and forethought.

October 14, 2003