Tag Archives: virtual memory

Virtual Memory (Theory vs Application part 3)

I think I was reading Reddit a little while ago when a pretty interesting article from the ACM Communications by Poul-Henning Kamp came up. It might be a little sensationalist, but its a pretty valid examination of what you can do to program performance when you consider how modern operating systems manage virtual memory. Kamp spent a lot of time building Varnish, and although I’d never heard of it before, it sounds like a very impressive utility. Its an HTTP accelerator and cache whose performance apparently blows Squid out of the water.

The meat of the article discusses how conventional thinking about the implementation of algorithms and data structures on modern computers is broken because they fail to consider the consequences of cache misses and page faults when accessing data.

On a modern multi-issue CPU, running at some gigahertz clock frequency, the worst-case loss is almost 10 million instructions per VM page fault. If you are running with a rotating disk, the number is more like 100 million instructions. What good is an O(log2(n))algorithm if those operations cause page faults and slow disk operations? For most relevant datasets an O(n) or even an O(n2) algorithm, which avoids page faults, will run circles around it.

Kamp provides some images and diagrams to visualize some of what he’s talking about. As a traditionally more systems-focused guy, I might be a little biased, but I feel that modern CS curriculums pump out students who have absolutely no idea about how computers actually work. Knowing the asymptotically optimal algorithm for everything is great; but if your practical implementation sucks because you’re causing paging all over the place, what good is it? A reasonable mixture makes sense – and the curriculum could make room by removing that course on CVS (and maybe even calculus – barf!)

I discussed a little earlier about how more intelligent practical implementations can have enormous effects over more naive solutions even though they remain asymptotically equal. Imagine wasting a hundred million instructions because your algorithm’s implementation unnecessarily left some node on disk!

Advertisements