Translation segmentation

I've been hacking furiously on doc.java.sun.com, the community translation site for java documentation (even if you're not interested in translation, it's still an interesting way to read javadoc). There's been a pile of bug fixing. The biggest change that people using the site will notice is that when you volunteer to do translations, it breaks large chunks of text down into more manageable segments. The isn't the full-up segmentation that professional translators need, but it's a step in the right direction. I also got a start on a statistics module. And I poured all of the JavaEE 5 javax.* sources into it. With all of JavaSE and JavaEE in it, it's clear that I need to work on the navigation facilities.

Some folks have looked at the sources and asked "where's the database?". I'm a big believer in RAM. If I used a real database, some pages would require close to 1000 database lookups to construct. Even with all kinds of cacheing, this turns into far too much time wasted waiting for a disk to spin. Instead, I just use in-memory data structures. Every user action (a translation or a vote) adds to the data structure and to a log file. When I reboot the server, I just reprocess the logs. Given the statistics and semantics of this application, this technique works well. It's moderately straightforward to make this work well on clusters. And it's fast.

October 20, 2006