Really Big Data

This is for those of you who have wondered what this mysterious "Density Functional Theory" Michael keeps mentioning in his physics posts is about. Please don't be frightened by the somewhat unwieldy name, I'll try to give you a rough flavour and an "executive summary" in a fun way.

The key to many kingdoms


The dream behind the whole endeavour is that we would very much like to be able to solve Schrödinger's equation: It contains (almost) everything one ever might want to know about chemistry, about material science  (which goes from steel industry down to nanotechnology, including semiconductor industry), molecular biology, pharmacology, and so on. The important point here is that one could simply compute all the required information in that areas, without needing any prior empirical knowledge, just from an invariable law of nature and a few constants.

Wave functions


Sounds to good to be true? It certainly is. Without going into details about Schrödinger's equation, what you would get as a result - if you could solve it - is the so-called "wave function" of the system. How complicated that wave function is depends on how many electrons there are in the system you are studying. Let's start with something simple: the good, old Ethane molecule (shown below), which consists of two carbon atoms, six hydrogen atoms, and a cloud of 30 electrons moving around them.

The wave function now depends on the positions of all 30 electrons in three-dimensional space: I hope you'll forgive me one formula - here's how it looks like:


The wave function itself doesn't have any real-world interpretation, but its square has: if you look at all that positions (r1, r2,...r30) at the same time, the square of the wave function tells you the probability that you will find electron number one at position number one, electron two at position number two, ... and so on (actually, it is not possible to number electrons, even in principle, so one still needs to subject the poor wave function to what is called antisymmetrization, which makes stuff even more complicated. Too complicated for this blog post ;).

Big data


The problem now is that this innocent looking wave function is quite a beast: consider, for a moment, you wanted to sample it on a grid and store it in memory. If you'd just use 20 grid points in each coordinate direction and double-precision numbers, this would amount to having to store 8 20^90 bytes!

That's obviously a large number, but let me briefly illustrate how large it is. The ultimate storage medium humanity could probably dream of is a medium where one could store one byte per atom - this would allow to store one billion Petabytes on the volume of a standard SD-card (like those you have in your digital camera). So how much volume would you need to store the wave function of Ethane? If you do the math, it turns out that you'd need about 1.6*10^39 cubic lightyears. Just in case cubic light years are not among the units you use on a daily basis: according to the NASA homepage, this is roughly one million times the size of our universe (a few universes more or less don't matter anymore at this stage ;)



Insane data compression

I believe at this point it is clear that directly calculating the wave function is not going to be feasible, ever. Creative people have invented many different ways to get around the problem, and one of those ways is density functional theory, or "DFT" as its friends are calling it.

DFT is based on an astonishing theorem found by Pierre Hohenberg and Walter Kohn in 1964. This theorem has to do something with the density, so let me explain that first: The density is the probability to find any electron (no matter which one) at a given position in space. It is a much simpler object than the wave function, because it is only a function in one (three-dimensional) position:
To give you a quick comparison: For the same 20 grid points used above for the wave function, it would take about 60 kilobytes to store it in memory - contrast that to the million universes above!

Now back to the theorem of Hohenberg and Kohn: The density could of course be calculated form the wave function (if we had that in first place). What Hohenberg and Kohn found out is that in principle, also the reverse is true! In principle (we don't know how to do that in practice), the wave function could be reconstructed from knowledge of the density alone.

In even plainer words: Those two things contain the same amount of information. Yes. The information contained in the million universes filled with mankind's ultimate storage medium can be compressed down to a few kilobytes.

I'll leave you with that thought for the time being - there is, of course, a slight catch, which is hidden in the phrase "in principle" above. Most of the research in DFT (and there is a lot) is about making this "in principle" happen in practice.