Monday, March 8, 2010

informing chrome about content

Hi everyone,

Post number two and I am already off-topic. The bug that I've been working on lately doesn't make anything start any faster (at least not right now), but it does get us closer to a world where Firefox gets to run its UI and content in different processes.

[If you haven't heard about Mozilla's grand plan to separate these into multiple processes, you better get with the program, dude! Or read that link, then you will be with the program.]

So one problem that happens if you run UI (chrome) and content (webpage stuff) in different processes is, you want to be able to update the UI according to how content is changing. Then you can have things like the progress and status bars that tell you the page is loading, or the Rainbow Pinwheel of Death to tell you when we are going to hang or crash. (Just kidding, maintaining control of the UI when content crashes is one of the main goals of this thing). But if they are running in different processes, you have to communicate that data between the processes.

In good old single-process Firefox, the content is loaded by the XUL element (sort of), and has an attribute called "webProgress" that lets you register listeners. Then, when it does navigation, it will send notifications to the listeners for different events like changes in the location, status, or security of the loaded document. So other UI elements can listen for changes this way. This webProgress thing is implemented basically just by exposing the C++ object that does the navigation (it is called a DocShell).

In multi-process Firefox, there are a couple of steps to make this happen. The element lives in the chrome process, because it is UI. But the loading happens in the content, because it is webpage stuff! So I am taking advantage of a pair of objects (TabChild and TabParent) that are already communicating through IPDL. TabChild runs in the content process, so I register it as a listener on the docshell that is doing the loading in the content process. Then, when it gets notifications, it sends messages to TabParent, which lives in the chrome process. TabParent implements nsIWebProgress, so listeners in the chrome process can register as listeners on that, and TabParent will notify them when it gets these messages.

A little fiddling around with browser.xml, and 's webProgress attribute now returns TabParent to its getters (but only if it is a MOZ_IPC is turned on, and it is a remote browser).

Some next steps: in the bug, some people said it would be nice if the chrome process could tell content to stop loading, or load in a different process, or things like that. It will be interesting to work that out, because communication is asynchronous (so maybe the content won't get the notice in time!)

Another problem is that the interface for the event listeners actually expects an object, nsIRequest, to be passed along with the other parameters (like status message, amount of progress made, etc). But we are actually just passing null for now, instead of sending that through IPDL. Listeners also expect an nsIWebProgress (the object that fired the notification) to be passed. In this case that is TabParent, but the 'true' notifier is really the docShell in the content process that sent the event to TabChild which passed it to TabParent... but we aren't passing that docShell through IPDL either. So we'll see if my use of those dummy objects becomes a problem in the future.

Cliffhanger!

Sunday, December 20, 2009

streams in yer cache

Hi everyone. Today, I'm going to talk a little about the startup cache, which I've been working on for longer than I like to think about.

startupcache, yay! fastload, booo
The startup cache is meant to replace Mozilla's fastload architecture. The purpose of these caches is to preserve intermediate results like compiled javascript and compiled xul documents, so that we don't need to load and parse them every time the browser starts. The main goals of this project are to improve the performance of the cache and to simplify the API. Performance improvement is kind of black magic, especially when doing hard drive access, so I'm going to ignore that part for now and talk about API simplification and some of the pitfalls we pretty much fell directly into.

One of the main problems with fastload is that it can be hard to understand, even as a client. It's implemented as a number of streams of data, multiplexed into a file. There are multiple fastload files, and each of them contains dozens of different streams of data, which are called 'documents' in fastload terminology. The fastload service is responsible for switching back and forth between documents within a file. Clients read from a stream called the 'fastload file', which is also manipulated by the fastload service behind the scene to make sure they are reading the right bytes for a document.

So a client has to understand how to make a fastload file, how to make a document within the fastload file, and then how to tell the service to seek to that document within the file. After a client is done reading, he has to restore the file to point to the previous document (and sometimes restore the service to point to the previous file).

Complicated. So when Taras and I designed the startup cache, our initial design was way over at the other end of the spectrum -- basically just a map, where the keys are c-strings and the values are blobs of data. You just call getBuffer(string), and putBuffer(string, buffer, buffer length), and that's all there is to it. We didn't even offer input/output streams, just plain old buffers. This is backed by a JAR, which does some checksumming and remembers the offset of different entries for us.

This works surprisingly well. Clients can serialize complicated data structures into a stream, and just dump the stream into a buffer and hand it to the cache. Next startup, take that buffer, make it into a stream, and read your complicated data representation back out. (We made the cache architecture-dependent, so clients don't even have to worry about endianness and word-size in whatever serialization format they choose. If a user moves to a machine with a different architecture, we just start up a new cache file for that machine.)

serializing yer pointers
There's a fundamental problem with this, though, which is probably familiar to anyone who has tried to do serialization. The problem pops up when you try to serialize a pointer. If two serialized structures A and B have pointers to the same structure C, they'd better both point to C again when you deserialize them. But if A and B got serialized to different buffers, someone trying to deserialize A will have a hard time finding out that there's this structure B in a different buffer with a pointer to the same object C.

It's possible, of course. The client has to maintain a map of objects and the pointers which refer to those objects (only the pointers he is serializing), and then the client can serialize that map as well. So in our example, when the client sees a reference to C, that is mapped to a structure that tells the client what else is pointing to C and where to find that other pointer. So who cares, the client can take care of it -- the startup cache is meant for blobs, not for complicated object cycles.

But it turns out that both of our initial clients would need to implement this kind of map, and a third one we had in mind would need it as well. So it's probably better just to have an interface where they can call readObject(...), and if A and B both readObject(C), they magically come out with pointers to the same object. We can't provide this sort of abstraction if we just take in and hand out data blobs to out clients. We need something where we can detect that a client is trying to read a pointer and not just any ol' PRUint32, and we can do this magical relinking behind the scenes -- in short, we need to pass out object streams. And that's why streams are back in yer cache.

Currently, I'm working on getting this relinking magic done. More specifically, I did about two full days of coding without compiling (you did what??), and now I'm fixing all of the resulting compiler errors. Then, I need to write some tests so that this sort of thing doesn't happen again. Amazingly, my original deeply-flawed implementation passed all of the tests (well, okay, there aren't any real tests of this stuff anyways) and also created a functional browser.

On the bright side, using this dumb (and wrong!) implementation of the cache, I shaved 300ms off of our cold startup time...
---
Follow-up, 3/3/2010: Turns out there is a reason the naive approach worked! I will blog about this soon, and why we decide to use data blobs instead of fancy object streams, after all.