Hi everyone. Today, I'm going to talk a little about the startup cache, which I've been working on for longer than I like to think about.
startupcache, yay! fastload, boooThe startup cache is meant to replace Mozilla's fastload architecture. The purpose of these caches is to preserve intermediate results like compiled javascript and compiled xul documents, so that we don't need to load and parse them every time the browser starts. The main goals of this project are to improve the performance of the cache and to simplify the API. Performance improvement is kind of black magic, especially when doing hard drive access, so I'm going to ignore that part for now and talk about API simplification and some of the pitfalls we pretty much fell directly into.
One of the main problems with fastload is that it can be hard to understand, even as a client. It's implemented as a number of streams of data, multiplexed into a file. There are multiple fastload files, and each of them contains dozens of different streams of data, which are called 'documents' in fastload terminology. The fastload service is responsible for switching back and forth between documents within a file. Clients read from a stream called the 'fastload file', which is also manipulated by the fastload service behind the scene to make sure they are reading the right bytes for a document.
So a client has to understand how to make a fastload file, how to make a document within the fastload file, and then how to tell the service to seek to that document within the file. After a client is done reading, he has to restore the file to point to the previous document (and sometimes restore the service to point to the previous file).
Complicated. So when Taras and I designed the startup cache, our initial design was way over at the other end of the spectrum -- basically just a map, where the keys are c-strings and the values are blobs of data. You just call getBuffer(string), and putBuffer(string, buffer, buffer length), and that's all there is to it. We didn't even offer input/output streams, just plain old buffers. This is backed by a JAR, which does some checksumming and remembers the offset of different entries for us.
This works surprisingly well. Clients can serialize complicated data structures into a stream, and just dump the stream into a buffer and hand it to the cache. Next startup, take that buffer, make it into a stream, and read your complicated data representation back out. (We made the cache architecture-dependent, so clients don't even have to worry about endianness and word-size in whatever serialization format they choose. If a user moves to a machine with a different architecture, we just start up a new cache file for that machine.)
serializing yer pointersThere's a fundamental problem with this, though, which is probably familiar to anyone who has tried to do serialization. The problem pops up when you try to serialize a pointer. If two serialized structures A and B have pointers to the same structure C, they'd better both point to C again when you deserialize them. But if A and B got serialized to different buffers, someone trying to deserialize A will have a hard time finding out that there's this structure B in a different buffer with a pointer to the same object C.
It's possible, of course. The client has to maintain a map of objects and the pointers which refer to those objects (only the pointers he is serializing), and then the client can serialize that map as well. So in our example, when the client sees a reference to C, that is mapped to a structure that tells the client what else is pointing to C and where to find that other pointer. So who cares, the client can take care of it -- the startup cache is meant for blobs, not for complicated object cycles.
But it turns out that both of our initial clients would need to implement this kind of map, and a third one we had in mind would need it as well. So it's probably better just to have an interface where they can call readObject(...), and if A and B both readObject(C), they magically come out with pointers to the same object. We can't provide this sort of abstraction if we just take in and hand out data blobs to out clients. We need something where we can detect that a client is trying to read a pointer and not just any ol' PRUint32, and we can do this magical relinking behind the scenes -- in short, we need to pass out object streams. And that's why streams are back in yer cache.
Currently, I'm working on getting this relinking magic done. More specifically, I did about two full days of coding without compiling (you did what??), and now I'm fixing all of the resulting compiler errors. Then, I need to write some tests so that this sort of thing doesn't happen again. Amazingly, my original deeply-flawed implementation passed all of the tests (well, okay, there aren't any real tests of this stuff anyways) and also
created a functional browser.
On the bright side, using this dumb (and wrong!) implementation of the cache, I shaved 300ms off of our cold startup time...
---
Follow-up, 3/3/2010: Turns out there is a reason the naive approach worked! I will blog about this soon, and why we decide to use data blobs instead of fancy object streams, after all.