# Explicit Optimization of Global Strategy (Fixing a Bug in UDT1)

When describing UDT1 solutions to various sample problems, I’ve often talked about UDT1 finding the function S* that would optimize its preferences over the world program P, and then return what S* would return, given its input. But in my original description of UDT1, I never explicitly mentioned optimizing S as a whole, but instead specified UDT1 as, upon receiving input X, finding the optimal output Y* for that input, by considering the logical consequences of choosing various possible outputs. I have been implicitly assuming that the former (optimization of the global strategy) would somehow fall out of the latter (optimization of the local action) without having to be explicitly specified, due to how UDT1 takes into account logical correlations between different instances of itself. But recently I found an apparent counter-example to this assumption.

(I think this “bug” also exists in TDT, but I don’t understand it well enough to make a definite claim. Perhaps Eliezer or someone else can tell me if TDT correctly solves the sample problem given here.)

Here is the problem. Suppose Omega appears and tells you that you have just been copied, and each copy has been assigned a different number, either 1 or 2. Your number happens to be 1. You can choose between option A or option B. If the two copies choose different options without talking to each other, then each gets $10, otherwise they get $0.

Consider what happens in the original formulation of UDT1. Upon receiving the input “1”, it can choose “A” or “B” as output. What is the logical implication of S(1)=”A” on the computation S(2)? It’s not clear whether S(1)=”A” implies S(2)=”A” or S(2)=”B”, but actually neither can be the right answer.

Suppose S(1)=”A” implies S(2)=”A”. Then by symmetry S(1)=”B” implies S(2)=”B”, so both copies choose the same option, and get $0, which is clearly not right.

Now instead suppose S(1)=”A” implies S(2)=”B”. Then by symmetry S(1)=”B” implies S(2)=”A”, so UDT1 is indifferent between “A” and “B” as output, since both have the logical consequence that it gets $10. So it might as well choose “A”. But the other copy, upon receiving input “2″, would go though this same reasoning, and also output “A”.

The fix is straightforward in the case where every agent already has the same source code and preferences. UDT1.1, upon receiving input X, would put that input aside and first iterate through all possible input/output mappings that it could implement and determine the logical consequence of choosing each one upon the executions of the world programs that it cares about. After determining the optimal S* that best satisfies its preferences, it then outputs S*(X).

Applying this to the above example, there are 4 input/output mappings to consider:

S

_{1}(1)=”A”, S_{1}(2)=”A”S

_{2}(1)=”B”, S_{2}(2)=”B”S

_{3}(1)=”A”, S_{3}(2)=”B”S

_{4}(1)=”B”, S_{4}(2)=”A”

Being indifferent between S_{3} and S_{4}, UDT1.1 picks S*=S_{3} and returns S_{3}(1)=”A”. The other copy goes through the same reasoning, also picks S*=S_{3} and returns S_{3}(2)=”B”. So everything works out.

What about when there are agents with difference source codes and different preferences? The result here suggests that one of our big unsolved problems, that of generally deriving a “good and fair” global outcome from agents optimizing their own preferences while taking logical correlations into consideration, may be unsolvable, since consideration of logical correlations does not seem powerful enough to always obtain a “good and fair” global outcome even in the single-player case. Perhaps we need to take an approach more like cousin_it’s, and try to solve the cooperation problem from the top down. That is, by explicitly specifying a fair way to merge preferences, and simultaneously figuring out how to get agents to join into such a cooperation.

- A model of UDT with a concrete prior over logical statements by 28 Aug 2012 21:45 UTC; 62 points) (
- Comparison of decision theories (with a focus on logical-counterfactual decision theories) by 16 Mar 2019 21:15 UTC; 61 points) (
- Conceptual Problems with UDT and Policy Selection by 28 Jun 2019 23:50 UTC; 51 points) (
- Policy Alignment by 30 Jun 2018 0:24 UTC; 50 points) (
- Original Research on Less Wrong by 29 Oct 2012 22:50 UTC; 47 points) (
- Notes on logical priors from the MIRI workshop by 15 Sep 2013 22:43 UTC; 30 points) (
- Predictability of Decisions and the Diagonal Method by 9 Mar 2012 23:53 UTC; 29 points) (
- Single player extensive-form games as a model of UDT by 25 Feb 2014 10:43 UTC; 25 points) (
- Consequentialist Formal Systems by 8 May 2012 20:38 UTC; 21 points) (
- UDT as a Nash Equilibrium by 6 Feb 2018 14:08 UTC; 18 points) (
- 31 Aug 2017 15:33 UTC; 14 points) 's comment on The Doomsday argument in anthropic decision theory by (
- 23 Jul 2018 7:51 UTC; 13 points) 's comment on Let’s Discuss Functional Decision Theory by (
- 26 Aug 2012 9:08 UTC; 10 points) 's comment on An angle of attack on Open Problem #1 by (
- 1 Jul 2018 11:54 UTC; 9 points) 's comment on Another take on agent foundations: formalizing zero-shot reasoning by (
- Conditioning on Conditionals by 17 Aug 2017 1:15 UTC; 7 points) (
- 23 Dec 2013 2:45 UTC; 7 points) 's comment on Building Phenomenological Bridges by (
- 17 Jan 2019 8:07 UTC; 6 points) 's comment on CDT=EDT=UDT by (
- 3 Jul 2018 8:19 UTC; 5 points) 's comment on The Prediction Problem: A Variant on Newcomb’s by (
- 23 Jul 2014 18:29 UTC; 5 points) 's comment on Welcome to Less Wrong! (6th thread, July 2013) by (
- 30 Jul 2010 4:16 UTC; 4 points) 's comment on Metaphilosophical Mysteries by (
- Functional Side Effects by 14 Jan 2014 20:22 UTC; 4 points) (
- 1 Jul 2018 17:50 UTC; 4 points) 's comment on Another take on agent foundations: formalizing zero-shot reasoning by (
- 13 Feb 2018 23:09 UTC; 2 points) 's comment on UDT as a Nash Equilibrium by (
- 8 Aug 2018 23:58 UTC; 2 points) 's comment on A Short Note on UDT by (
- 10 Jun 2010 20:59 UTC; 2 points) 's comment on UDT agents as deontologists by (
- 13 Jun 2010 18:24 UTC; 2 points) 's comment on UDT agents as deontologists by (
- 1 Aug 2010 8:17 UTC; 1 point) 's comment on AI cooperation in practice by (
- 21 Sep 2017 8:39 UTC; 1 point) 's comment on Publication of “Anthropic Decision Theory” by (
- 12 Jun 2010 17:12 UTC; 1 point) 's comment on UDT agents as deontologists by (
- 11 Jun 2010 17:21 UTC; 0 points) 's comment on UDT agents as deontologists by (
- 11 Jun 2010 20:39 UTC; 0 points) 's comment on UDT agents as deontologists by (
- 11 Jun 2010 20:48 UTC; 0 points) 's comment on UDT agents as deontologists by (
- 11 Jun 2010 17:44 UTC; 0 points) 's comment on UDT agents as deontologists by (
- 13 Jun 2010 17:47 UTC; 0 points) 's comment on UDT agents as deontologists by (
- 13 Jun 2010 21:13 UTC; 0 points) 's comment on UDT agents as deontologists by (
- 28 Sep 2017 6:58 UTC; 0 points) 's comment on Naturalized induction – a challenge for evidential and causal decision theory by (

Set aside different preferences, because I have no idea how to deal with that. It seems like just having different source codes makes things very difficult.

You used “1” and “2″ as your indexical labels, which come with a very obvious order. But suppose that Omega labeled one agent ♠ and the other agent ♣. If you and your opposite have different source codes, you have no idea how your opposite internally represents the symbol that they received. For example, you certainly have no guarantee that the two of you order these in lexicographically the same way. So how could you possibly coordinate on a tie-breaking strategy?

If you are incapable of consistently putting things into order your winnings on this type of problem go down.

Resource-limited agents should not stress too much over this—such problems are mainly of interest to philosophers.

Where do you think preference lives? It’s but a property of the source code, or rather, a way of looking at the source code. If you change the source code, preference changes as well (to some extent).

I agree that preferences live in the source code. Having different preferences implies having different source codes. I also agree that preferences are but a way to interpret source code, so that any two source codes might be said to have the same preferences with respect to

somepreference scheme.So my comment was assuming that we already have an implicit way of mapping source codes to preference schemes, and that that mapping is not injective. That is, different source codes might have the same preference scheme. But these different source codes might still use different internal representation schemes for representing their input.

I think that my issue disappears if each agent has access to the other’s source code.

I don’t understand what the above means (and so can’t readily be in agreement with it). (“Preference scheme”?)

I expect preference mapping to be “essentially” injective (possibly disregarding only stuff that doesn’t play any role in the algorithm, including the

wayit computes as well as the externally visible behavior). Preference is what the programdoes, and it’s everything that the program does. (But what the program does isn’t necessarilypreferableaccording to that same program seen as preference.)At this date almost four years later, I’m not exactly sure what I meant, which is a pretty strong indictment of my ability to write clearly.

I

thinkthat I meant the following: Take the set of all possible outcomes, partition this set of outcomes into equivalence classes, and then put an order on the resulting set of equivalence classes. The ordered set of equivalence classes is what I was calling a “preference scheme”. (… I think.)ETA: On further thought, I don’t think that that’s exactly what I meant, but I honestly don’t know what I

didmean. I’d like to think that I could have explained myself better at the time, but I didn’t write enough so that even I-now could reconstruct what that explanation would have been.Conventionally, spades comes before clubs:

Bridge: spades, hearts, diamonds, clubs;

Poker (generally) spades, hearts, clubs, diamonds.

Except that generally you list them going

up—clubs, diamonds, hearts, spades.Reference? They are not generally listed that way on the internet:

Google “spades, hearts, diamonds, clubs” − 4,090

Google “clubs, diamonds, hearts, spades” − 2,950

Huh, that surprises me. I was going purely by personal experience, plus the fact that bidding goes from low to high in Bridge; that’s the order I would expect people to list the suits in, because that’s the order they occur in.

Those numbers are not hugely different—it might be more accurate to say that “there is no reliable consensus on order”.

To deal with this kind of problem, you want the best ordering you can find.

Not everyone has to agree on it.

The other agent, who has different source code to you, has to agree on it. If it were you and Sniffnoy playing the game …

In the post, it said:

“Suppose Omega appears and tells you that you have just been copied”.

Oh,I see the problem. I was talking about Tyrrell_McAllister’s question upthread, in which the assumption of identical source code (i.e. copying) is dropped.If you don’t know much about the other agent—except that it is also trying to win—I figure you should also probably just do the best you can to pick the most mutually-obvious ordering, hoping that they will be doing much the same. Sometimes, it won’t work out—but that is doing as well as you can.

That’s assuming linear utility. If the most important thing is to consistently get at least a few points, then randomness may be a better strategy.

I’d use alphabetical ordering, and assume my opposite would too. So clubs would come before spades, black before white, etc.

Now, if your opponent doesn’t use the same language (or doesn’t use a language, only unicode symbols), it almost becomes a game of chance, like “here’s an aarvark and a wapiti, the same animals have been shown to some (non-english speaking) Chinese guy, you each get to choose one, if the two of you choose different ones you get a hundred bucks”.

You can’t,

unlessyou can communicate in some way, such as by talking to eachother or reading eachother’s source code. In that case it’s easy.You want a general algorithm that plays arbitrary games against other copies of itself “fairly”. If utilities are transferable, it’s easy, but note that this solution

wasn’tderived from consideration of logical correlations alone. If utilities are non-transferable, I’ve given up on the problem because the literature offers bewilderingly many different solutions and no clear winner.You could try a simpler problem as a stepping stone: blame assignment. (I’m currently trying to solve this myself for a future post but what the hell, let a million flowers bloom.) In a given play of a multiplayer game, how much credit/blame should we assign to player A for the payoff that player B got? Concrete example: how many people did one voter personally kill by voting for Hitler and how is this responsibility shared between them and Hitler? I believe this should be easier than NTU fairness but it’s still too tricky for me.

It’s misleading when you are calling efficient decision-making “fairness”. It’s not about giving the poor an equal share, it’s about making sure you get the best deal there is. (And such an algorithm should be able to play arbitrary games not just against other copies of itself—by the way, what exactly is a copy is we don’t have a formal setting?)

Completely agreed on first point, I view it as something like “fairness according to bargaining power”.

In a problem this hard, I still consider full source code visibility a reasonable starting point. I’ll be very suspicious if a proposed solution works in some other formal setting but

doesn’twork under full source visibility.This is essentially PD (in the aspects relevant to this post), but without magical identification of Player1.Cooperate and Player2.Cooperate by virtue of using the same label “Cooperate”. Consider what happens with your thought experiment if the rule given by Omega is “You should give the same answer”, given that they already have differentiating info (assigned number). This distinction shouldn’t matter.

Recursion through decision-making of all relevant agents seems conceptually indispensable. When Player1 sees itself again through the eyes of Player2, symmetry or asymmetry between the players (as opposed to identity of the recursive copies of each player) becomes irrelevant.

Consider: if players are different, then Player1 knows about Player2 that knows about Player1 (recursion, a site of TDT-like acausal control); if players are the same, then Player1 knows about identical Player2 (recursion at the first step). If we go the first road, not much is lost, but we get more generality.

Ok, there are currently two suggested ways for agents to achieve logical correlation: through mutual prediction, or by using the same source code, and it sounds like you’re saying that the first method is more powerful. But so far I don’t really see how it can work at all. Can you explain how the mutual prediction approach would solve the problem given in my post, or any other problem that might show its advantage?

Mitchell_Porter explained in detail how he dealt with the problem. Perhaps consider his comment.

The two of you seem to be missing the point of this post. This sample problem isn’t hard or confusing in and of itself (like Newcomb’s Problem), but merely meant to illustrate a limitation of the usefulness of logical correlation in decision theory. The issue here isn’t whether we can find some way to make the right decision (obviously we can, and I gave a method in the post itself) but whether it can be made through consideration of logical correlation alone.

More generally, some people don’t seem to get what might be called “decision theoretic thinking”. When some decision problem is posted, they just start talking about how they would make the decision, instead of thinking about how to design an algorithm that would solve that problem and every other decision problem that it might face. Maybe I need to do a better job of explaining this?

Mitchell_Porter didn’t

justsolve the problem, heexplainedhow he did it.Did he do it “by consideration of logical correlation alone”? I do not know what that is intended to mean. Correlation normally has to be between two or more variables. In the post you talk about an agent taking account of “logical correlations between different instances of itself”. I don’t know what that means either.

More to the point, I don’t know why it is desirable. Surely one just wants to make the right decisions.

Expected utility maximisation solves this problem fine—if the agent has a tendency to use a similar tie-breaking strategy to Mitchell_Porter . If an agent has no such tendency, and expects this kind of problem, then it will aspire to develop a similar tendency.

This sounds like your decision theory is “Decide to use the best decision theory.”

I guess there’s an analogy to people whose solution to the hard problems that humanity faces is “Build a superintelligent AI that will solve those hard problems.”

Not really—provided you make decisions deterministically you should be OK in this example. Agents inclined towards randomization might have problems with it—but I am not advocating that.

This was my thought process: To get the $10, my copy and I have to choose differently. I am 1, he is 2. I have to choose A or B… At some point I thought of the mapping A=1, B=2, implicitly as part of the bigger mapping (A...Z)=(1...26) I suppose. I noticed that this was a particular mapping which had spontaneously presented itself to me. So it must be a natural one for me to think of; so there is a good chance my copy will think of it as well. So I select A, hoping my copy went through an analogous process, arrived at the same mapping and selected B.

Here you are relying on omega using two ordering systems that we already find highly correlated.

What if Omega asked you to choose between a blegg and a rube instead of A and B. Along with that, Omega tells you that it did not necessarily use the same ordering of blegg and rube when posing the question to the copy.

EDIT: More thoughts: If you can’t rely on an obvious correlation between the player labels and choices, why not have a strategy to make a consistent mapping from the player labels to the choices.

The key to winning this game is having both parties disagree. If both parties know the goal and have a consistent mapping process, it would be trivial for them to arrive at different choices.

A simple mapping would be alphabetize the player labels and the choice labels. Player(1) ⇒ choice(1), Player(2) ⇒ choice(2), Player(n) ⇒ choice(n).

Lexicographic ordering is indeed the most obvious one here.

If only Omega let you use quantum pseudo-telepathy.

I don’t understand this statement. What do you mean by “logical correlations”, and how does this post demonstrate that they are insufficient for getting the right solution?

Suppose you’re choosing a strategy S for a cooperation game with some other entity X, which you are told nothing about. Then U(S) = .5

(S(1)!=X(2)) + .5(S(2)!=X(1)) In this case, you have to choose a probability distribution over other entities X, and choose S to optimize the utility function based on that. There’s no way around that. If we’re told that X was given the same utility function, and is trying to optimize over it, then that greatly narrows down the possibilities for what X is. We assume that X is chosen, by some unspecified but intelligent process, to also optimize U. Fortunately, English culture provides a standard mapping between numbers and letters (A=1, B=2, C=3, …); so if we assume X has some probability of coming from a similar culture and choosing that mapping for that reason, and will choose an arbitrary random mapping otherwise, then we’re better off with the standard mapping.If the other agent has a different utility function, then that changes your probability distribution over what that agent is. If we’re told that the other agent is supposed to implement the utility function “1 if it chooses A, 0 if it chooses B”, then its implementation is probably going to be to just return A, so we should always return B.

Now assume that when we enter into the coordination game, we’re told something about A, and A is told something about us. Then our utility function is U(S) = .5

(S(1,X)!=X(2,S)) + .5(S(2,X)!=X(1,S)) We still need a probability distribution over Xs, but this time the distribution includes Xs that model S. If we’re also told that X is supposed to be optimizing the same utility function, then we can assign some probability to it modeling S with each of various techniques, and to it being model-able with each of various techniques. Not all modeling techniques will work on all functions—some of them lead to infinite regress, some are just bad designs that can’t model anything accurately, etc—so to maximize the probability of successful coordination we should both make S easy for X to model, and make S try to model X.Different kinds of games lead to different kinds of likely opponents, hence the field of game theory. A nash equilibrium is any pair of strategies that optimize utility under the assumption that the other is their opponent.

Does UDT not allow randomization, or is Omega just disallowing it in this case?

EDIT: Or does it turn out to be irrelevant?

You don’t want to randomize in this case. Omega has given you a unique ID. Exploiting that information in a well-chosen deterministic strategy gets you a higher chance of winning than does picking randomly.

1A 2B makes more sense than the reverse—I would go with that.

The “by symmetry” argument in the post should be rejected, since being given “1″ and being given “2” are not really “symmetrical”.

UDT allows randomization. Any prevention would be due to Omega’s handling of expected randomization.

(Pardon me, I should say that I believe UDT allows randomization based on my prior suggesting UDT is not obviously insane.)