Computational Intractability as an Argument for Entropy-Driven Decision Making

BLUF: I have another argument in favor of choosing courses of action via recently produced, quantum-generated random numbers.

Recently, I wrote a long post about a decision criterion for choosing Courses of Action (CoA) under moral uncertainty given the Many-Worlds Interpretation of quantum mechanics. While this has been shot down on LessWrong and I am still working on formalizing my intuition regarding relative choice-worthiness (a combination of Bayesian moral uncertainty and traditional expected utility), as well as figuring out exactly what it means to live in a Many-Worlds universe, I tentatively believe I have another argument in favor of entropy-driven CoA selection: computational intractability.

Traditional decision theory has not focused a ton, to my knowledge, on the process of agents actually computing real world expected-utility estimates. I think the simplest models basically assume agents have infinite computations available. What decision is an agent to make when they are far from being done computing the expected-utility of different CoA? Of course, this depends on the algorithm they use, but in general, what decision should they make when the time to make a decision comes early?

In a Many-Worlds universe, I am inclined to think agents should deliberately throw entropy into their decisions. If they have explored the optimization space to the point where they are 60% sure they have found the optimal decision, they should literally seek out a quantum mechanics generated random number–in this case between 1 and 5–and if the number is 1, 2, or 3, they should choose the course of action they are confident in; otherwise, they should choose a different promising course of action. This ensures that child worlds are appropriately diversifying so “all of our eggs are not in one basket”.

If the fundamental processes in the universe–from statistical mechanics to the strong economic forces present today in local worlds based on human evolutionary psychology–lean in favor of well-being over suffering, then I argue that this diversification is anti-fragile.

A loose analogy (there are slightly different principles at play) is investing in a financial portfolio. If you really don’t know which stock is going to take off, you probably don’t want to throw all your money into one stock. And choosing courses of action based on quantum random number generation is *the only* way to reasonably diversify one’s portfolio; even if one feels very uncertain about one’s decision, in the majority of child worlds, one will have made that very same decision. The high-level processes of the human brain are generally robust against any single truly random quantum mechanics event.

I am still working on understanding what the generic distribution of child worlds looks like under Many-Worlds, so I am far from completely certain that this decision-making principle is ideal. However, because it does seem promising, I am seeking to obtain a hardware true random number generator to experiment with this principle–I won’t learn the actual outcomes, which have to be predicted from first-principles, but I can learn how it feels psychologically to implement this protocol. At this point, it looks like I am going to have to make one. I’ll add to this post when I do.

A Decision Theory for Many-Worlds Living

Here, I describe a decision theory that I believe applies to Many-Worlds living that combines principles of quantum mechanical randomness, evolutionary theory, and choice-worthiness. Until someone comes up with a better term for it, I will refer to it as Random Evolutionary Choice-worthy Many-worlds Decisions Theory, or RECMDT.


If the Many World’s Interpretation (MWI) of quantum mechanics is true, does that have any ethical implications? Should we behave any differently in order to maximize ethical outcomes? This is an extremely important question that I’m not aware has been satisfactorily answered. If MWI is true and if we can affect the distribution of other worlds through our actions, it means that our actions have super-exponentially more impact on ethically relevant phenomena. I take ethically relevant phenomena to be certain fundamental physics operations responsible for the suffering and well-being associated with the minds of conscious creatures.

My Proposal

We ought to make decisions probabilistically based on sources of entropy which correspond with the splitting of worlds (e.g. particle decay) and the comparative choice-worthiness of different courses of action (CoA). By choice-worthiness, I mean a combination of the subjective degree of normative uncertainty and expected utility of a CoA. I will go into determining choice-worthiness in another post.

If one CoA is twice as choice-worthy as another, then I argue that we should commit to doing that CoA with 2:1 odds or 66% of the time based on radioactive particle decay.


Under a single unfolding of history, the traditional view is that we should choose whichever CoA available to us which has the highest choice-worthiness. When presented with a binary decision, the thought is that we should choose the most choice-worthy option given the sum of evidence every single time. However, the fact that a decision is subjectively choice-worthy does not mean it is guaranteed to actually be the right decision—it could actually move us towards worse possible worlds. If we think we are living in a single unfolding of history but are actually living under MWI, then a significant subset of the trillions↑↑↑ (but a finite number) of existing worlds end up converging on similar futures, which are by no means destined to be good.

However, if we are living in a reality of constantly splitting worlds, I assert that it is in everyone’s best interest to increase the variance of outcomes in order to more quickly move towards either a utopia or extinction. This essentially increases evolutionary selection pressure that child worlds experience so that they either more quickly become devoid of conscious life or more quickly converge on worlds that are utopian.

As a rough analogy, imagine having a planet covered with trillions of identical, simple microbes. You want them to evolve towards intelligent life that experiences much more well-being. You could leave these trillions of microbes alone and allow them to slowly incur gene edits so that some of their descendants drift towards more intelligent/evolved creatures. However, if you had the option, why not just increase the rate of the gene edits, by say, UV exposure? This will surely push up the timeline for intelligence and well-being and allow a greater magnitude of well-being to take place. Each world under MWI is like a microbe, and we might as well increase the variance, and thus, evolutionary selection pressure in order to help utopias happen as soon and as abundantly as possible.

What this Theory Isn’t

A key component of this decision heuristic is not maximizing chaos and treating different CoAs equally, but choosing CoAs relative to their choice-worthiness. For example, in a utopian world with, somehow, 99% of the proper CoAs figured out, only in 1 out of 100 child worlds must a less choice worthy course of action be taken. In other words, once we get confident in particular CoA, we can take that action the majority of the time. After all, the goal isn’t for 1 world to end up hyper-utopian, but to maximize utility over all worlds.

If we wanted just a single world to end up hyper utopian, then we want to act in as many possible ways based on the results of true sources of entropy. It would be ideal to come up with any cource of action and flip a (quantum) coin and go off its results like Two-Face. Again, the goal is to maximize utility over all worlds, so we only want to explore paths in proportion to the odds that we think a particular path is optimal.

Is it Incrementally Useful?

A key component of most useful decision theories is that they are useful insofar as they are followed. As long as MWI is true, each time RECMDT is deliberately adhered to, it is supposed to increase the variance of child worlds. Following this rule just once, depending on the likelihood of worlds becoming utopian relative to the probability of them being full of suffering, likely ensures many future utopias will exist.

Crucial Considerations

While RECMDT should increase the variance and selection pressure on any child worlds of worlds that implement it, we do not know enough about the likelihood and magnitude of suffering at an astronomical level to guarantee that the worlds that remain full of life will overwhelmingly tend to be net-positive in subjective well-being. It could be possible that worlds with net-suffering are very stable and do not tend to approach extinction. The merit of RECMDT may largely rest on the landscape of energy-efficiency of suffering as opposed to well-being. If suffering is very energy inefficient compared to well-being, then that is good evidence in favor of this theory. I will write more about the implications of the energy-efficiency of suffering soon.

Is RECMDT Safer if Applied Only with Particular Mindsets?

One way to hedge against astronomically bad outcomes may be to only employ RECMDT when one fully understands and is committed to ensuring that survivability remains dependent on well-being. This works because following this decision theory essentially increases the variance of child worlds like using birdshot instead of a slug. If one employs this heuristic only while having a firm belief and commitment to a strong heuristic to reduce the probability of net-suffering worlds, then it seems that yourself in child worlds will also have this belief and be prepared to act on it. You can also only employ RECMDT while you believe in your ability to take massive-action on behalf of your belief that survivability should remain dependant on well-being. Whenever you feel unable to carry out this value, you should perhaps not act to increase the variance of child worlds because you will not be prepared to deal with the worst-case scenarios in those child worlds.

Evidence against applying RECMDT only when one holds certain values strongly, however, is all the Nth-order effects of our actions. For decisions that have extremely localized effects where one’s beliefs dominate the ultimate outcome, the plausible value of RECMDT over not applying it is rather small.

For decision with many Nth order effects, such as deciding which job to take (which, for example, has many unpredictable effects on the economy), it seems that one cannot control for the majority of the effects of one’s actions after an initial decision is made. The ultimate effects likely rest on features of our universe (e.g. the nature of human market economies in our local group of many-worlds) that one’s particular belief has little influence over. In other words, for many decisions, one can affect the world once, but they cannot control the Nth order effects through acting a second time. Thus, while certain mindsets are useful to hold dearly regardless of whether one employs RECMDT, it seems that it is not generally useful for one to not employ RECMDT if they are not holding any particular mindsets.

Converting Radioactive Decay to Random Bit Strings

In order to implement this decision theory, agents much require access to a true source of entropy—pseudo-random number generators will NOT work. There are a variety of ways to implement this, such as by having an array of Geiger counters surrounding a radioactive isotope and looking at which groups of sensors get triggered first in order to yield a decision. However, I suspect one of the cheapest and most reliably random sensors would be built to implement the following algorithm from HotBits:

Since the time of any given decay is random, then the interval between two consecutive decays is also random. What we do, then, is measure a pair of these intervals, and emit a zero or one bit based on the relative length of the two intervals. If we measure the same interval for the two decays, we discard the measurement and try again, to avoid the risk of inducing bias due to the resolution of our clock.

John Walker
from HotBits

Converting Random Bit Strings to Choices

We have a means above to generate truly random bit strings that should differ between child worlds. The next question is how do we convert these bit strings to choices regarding which CoA we will execute? This depends on the number of CoAs we were considering and the specific ratios that we arrived at for comparative choice-worthiness. We simply need to determine the least common multiple of all the individual odds of each CoA, and acquire a bit string that is long enough that its representation as a binary number is higher than the least common multiple. From there, we can use a simple preconceived encoding scheme to have the base 2 number encoded in the bit string select for a particular course of action.

For example, in a scenario where one CoA is 4x as choice-worthy as another, we need a random number that represents the digits 0 to 4 equally. Drawing the number 4 can mean we must do the less-choice worthy CoA, and drawing 0-3 can mean we do the more choice-worth CoA. We need at least 3 random bits in order to do this. Since 2^3 is 8 and there is no way to divide the states 5, 6, 7 equally to the states 0, 1, 2, 3, and 4, we cannot use this bit string if it is over 4, and must acquire another one until we acquire a number under 4. Once we select a bitstring with a number below our least-common-multiple, we can use the value of the bit string to select our course of action.

The above selection method prevents us from having to make any rounding errors, and it shouldn’t take that many bits to implement as any given bit string of the proper length always has over a 50% chance of working out. Other encoding schemes introduce rounding errors, which only detract from the uncertainty of our choice-worthiness calculations.

What Does Application Look Like?

I think everyone with solid choice-worthy calibrating ability should have access to truly random bits to choose courses of action from.

Importantly, the time of the production of these random bits is relevant. A one-year-old random bitstring captured from radioactivity is just as random as one captured 5 seconds ago, but employing the latter is key for ensuring the maximum number of recent sister universes make different decisions.

Thus, people need access to recently created bit strings. These could be from a portable, personal Gieger counter, but it could also be from a centralized Gieger counter in say, the middle of the United States. The location does not matter as much as the recency of bit production. Importantly, however, bit strings should not ever be reused as this is not as random as using new bit strings as whatever made you decide to reuse them is non-random information.

Can We Really Affect the Distribution of Other Worlds through Our Actions?

One may think that since everything is quantum mechanics including our brains, can we really affect the distribution of child worlds from our intentions and decisions? This raises the classic problem of free will and our place in a deterministic universe. I think the simplest question to ask is: do our choices have an effect on ethically-relevant phenomena? If the answer is no, then why should we care about decision theory in general? I think it’s useful to think of the answer as yes.

What if Many Worlds Isn’t True?

If MWI isn’t true, then RECMDT optimizes for worlds that will not exist at the potential cost to our own. This may seem to be incredibly dangerous and costly. However, as long as people make accurate choice-worthiness comparisons between different CoAs, then I will actually argue that adhering to RECMDT is not that risky. After all, choice-worthiness is distinct from expected-utility.

It would be a waste to have people, in a binary choice of actions with one having 9x more expected-utility than the other, choose the action with less expected-utility even 10% of the time. However, it seems best, even in a single unfolding of history, that where we are morally uncertain, we should actually cycle through actions based on our moral uncertainty via relative choice-worthiness.

By always acting to maximize choice-worthiness, we risk not capturing any value at all through our actions. While I agree that we should maximize expected-utility in both one shot and iterative scenarios alike and be risk neutral assuming we adequately defined our utility function, I think that given the fundamental uncertainty at play in a normative uncertainty assessment, it is risk neutral to probabilistically decide to implement different CoAs relative to their comparative choice-worthiness. Importantly, this is only the ideal method if the CoAs are mutually exclusive–if they are not, one might as well optimize for both moral frameworks.

Hence, while I think RECMDT is true, I also think that even if MWI is proven false, a decision theory exists which combines randomness and relative choice-worthiness. Perhaps we can call this Random Choice-worthy Decision Theory, or RCDT.

I am still actively working on this post, but I am excited enough about this idea enough that I didn’t want to wait to post it. Let me know what you think of this!

Subjective Probability

BLUF: I found an essay by Nick Bostrom that perfectly coincides with my ideals towards Bayesian probability and how I aspire to consciously hold degrees-of-belief and continually update on evidence.

For me, belief is not an all-or-nothing thing—believe or disbelieve, accept or reject. Instead, I have degrees of belief, a subjective probability distribution over different possible ways the world could be. This means I am constantly changing my mind about all sorts of things, as I reflect or gain more evidence. While I don’t always think explicitly in terms of probabilities, I often do so when I give careful consideration to some matter. And when I reflect on my own cognitive processes, I must acknowledge the graduated nature of my beliefs.

-Nick Bostrom, 2008, Response to the 2008 EDGE Question: “What Have You Changed Your Mind About?”

Unpopular Ideas: Theory and Links

I was inspired by some of Julia Galef’s posts and am going to collect some unpopular ideas here as it’s paramount we consider ideas far outside the status-quo on a continual basis.

Just as many ideas we hold as true today were counter-intuitive in the past, we will probably eventually shift towards accepting a number of currently counter-intuitive beliefs in the future. This seems almost inevitable unless you think we have reached the pinnacle of human progress, but that is probably (not to mention hopefully) not the case. Since an eventual change in many of our beliefs is almost inevitable, we should hasten the progress and let “that which can be destroyed by the truth, be destroyed by the truth.”

One way this can happen quickly is by taking unconventional and unpopular ideas seriously and trying to make them work. Not only will we find hidden truths sometimes, but even when we do disprove these unconventional ideas, we still benefit from decoupling a bit from the status-quo and thinking in a new way.

A counterargument to considering unconventional ideas are that some of them are information hazards and that merely by considering them, one will inevitably be brought astray. I acknowledge that this can be the case in some people, and even in myself if I somehow forget my priors and to be intellectually rigorous in my explorations. That being said, if some of our most rational people cannot safely consider unpopular ideas, humanity probably doesn’t have the intelligence to survive long-term anyways.

I am not endorsing these ideas! I disagree with many of them and am merely collecting them for the sake of intellectual thoroughness.

Julia Galef’s Lists of Unpopular Ideas about:

My List: Coming soon once I get through Julia’s.