Start, as the ancients did, with an utterly unconstrained tohu va vohu — where anything can happen, and consequently everything does. Then impose a constraint: some events are nolonger allowed to happen. How do you measure how much any particular constraint reduces the variety of your world?
Wiener & Shannon worked this one out in the 1940′s, inaugurating what became known as the theory of information and control. (These are just two terms for the same thing seen from two different ends of a communication: information to the receiver is control from the sender.) Let’s say that you’ve got some x varying continuously within a given range, and a probability density p(x) over said range (i.e. a function defining what portion of its time x spends in any given state). Following Wiener, we can make a few innocent (and inessential) assumptions about the distribution and define a measure of the variety on x:

This gives us a nice quantitative sense of how much wiggle room x has within the given constraints — in other words, how uncertain we are about its value at any given moment. But what constrains a thing? The obvious answer would be “some other thing”. So let’s say we have a y constrained comparably to x, though it can have a very different p(y) and doesn’t have to be measured in the same units. Then we can form a joint probability density function for them and use that to get their joint variety:

You can think of this as measuring the total uncertainty on the couplet (x,y). These two things have some sort of relationship whose specifics we don’t necessarily know, but we can immediately measure the strength of the relationship thusly:

And this right here, folks is important: I(x;y) is called the mutual information of x and y, and tells you how tightly correlated any two variables are. A couple of things you may want to notice in passing is that it’s a symmetrical measure, so that I(x;y) = I(y;x), and that I(x;x) = H(x), so you can think of the variety on x as its self-information.
Now, Shannon was particularly interested in the special case where x and y were the inputs and outputs of a transducer — that is, a communication channel of doubtful fidelity. In this case you can think of the maximum value of I(x;y) as the capacity of the channel. One of Shannnon’s most important theorems (sometimes called the Coding Theorem) states that you can correct an arbitrary amount of error introduced to your signal by increasing the capacity of the channel (or, more usually, that given a fixed channel capacity there’s a limiting rate of transmission you can achieve without going over an error threshold). It doesn’t tell you how to do it, but it’s nice knowing that you can.
Ross Ashby was, so far as I know, the first person to see the potentially huge implications of this theorem, and generalized it into what he called the Law of Requisite Variety, which says that just as it takes money to make money, it takes variety to destroy variety.
Recall what I said above about control being just information seen from the outbound side, and think about the fact that being able to control your environment is a necessary subgoal to any other goal you might have — you need to be able to cancel out the noise your environment throws at you in order to accomplish anything. Now, I’m this animal with a bunch of inputs (senses and such) and outputs (muscles and whatnot), and my nervous system is effectively one big transfer function that coordinates them. As a necessary condition for me to live my happy little life, I need to be able to correct any disturbances my environment throws at me. Increasing my ability to adapt is isomorphic with increasing my channel capacity.
Looking at the three terms on the right hand side of the equation above, it suggests three ways that I can do that: increase the variety of states of the world that I can discern (perceptual range), increase the variety of ways I can respond to them (behavioral range), and decrease the joint variety between these two things by making my behavior more determinate — since it’s not so good if half the time I see a morsel of food I randomly shove it into my own eye rather than my mouth.
We’d expect animals that have made a living this way for a long time to have bred-in biases toward increasing their adaptive capacity in precisely these three ways — whence curiosity, play, and habit, respectively. But each of these impulses has failure modes — such as taking in a lot of useless information, wasting effort on pointless shit, and behavioral rigidity, respectively. And these should tend to fail more often the more our environment consists of other animals trying to do the same stuff. What works well against a relatively passive environment fails pathetically against one that’s trying to control you as much as you’re trying to control it.
The consequences of this will be elaborated subsequently.
[This post will likely undergo repeated revisions for clarity.]