This is roughly the 5th article on a "Revolution in Game Theory" that I've seen and not a single one of them has successfully described what the so called revolutionary strategy is.
Here is a direct link to a PDF of a paper defining the Zero Determinant Strategy. http://arxiv.org/pdf/1208.2666v1.pdf I don't really understand it. Neither do journalists it seems. Can someone explain?
Wow. Not long ago I was trying to test whether an idea for a certain kind of online interaction would work out the way I hoped. I built a simulation with agents that evolved their strategies, trying to optimize their own outcomes.
I don't believe this outcome is optimal from an evolutionary standpoint. It requires the other player to know what your strategy is and accept the ultimatum (I will do x% better than you). The other player still has the option to reject the ultimatum, screwing over both parties.
In other words, for this strategy to work it needs "pragmatic" other players that are (a) intelligent enough to know what the other player is doing and (b) willing to accept the unequal outcome. This strategy isn't symmetric: if you turned two of these bots against each other the total outcome would be much worse than if two TFT bots played each other.
Thus it stands to reason that TFT or some small variant is still the evolutionary winner.
What I'm thinking is, it appears that players with a theory of mind may not actually use the optimal evolutionary strategy. Since humans have a theory of mind, my simulation may be a poor predictor of what my users would actually do.
I found the second link you posted when I went looking for clarification. The takeaway for me was that zero-determinant strategies do tend to win games, in that the end score of someone playing a ZD strategy is usually higher than that of someone playing a different strategy, but that both scores in such games tend to be low. Against a basic tit-for-tat player, the ZD player will win by a very small margin but the game will largely consist of mutual defections. Basic tit-for-tat, as the author points out, is never a 'winning' strategy--it always either ties or loses. Its strength isn't that it wins, but rather that it limits its losses. That characteristic of tit-for-tat is as effective against a ZD strategy as it is against any other strategy.
I dunno, I think this sentence in the 2nd paragraph is quite clear:
Such games can be described by
a Markov process defined by the four probabilities that char-
acterize each of the two player’s strategies [9] (because this is
an infinitely repeated game, the probability to engage in the
first move–which is unconditional–does not play a role here).
Each Markov process has a stationary state given by the left
eigenvector of the Markov matrix, which in this case describes
the equilibrium of the process.
...
If we assume that two opponents play a sufficiently large
number of games, their payoff approaches the payoff of the
Markov stationary state [1,9]. We can use this mean expected
payoff as the payoff to be used in the payoff matrix E that
will determine the ESS.
TLDR: keep varying your strategy until you win in the long term, using your mean expected payoff as a payoff matrix in each subsequent state.
At a high level the strategy is pretty intuitive. It's why cognition is important. if you're opponent purely optimizes for choosing the best option for each individual decision, or even for the game, you can take advantage of them. If you face someone who understands how you're trying to manipulate them then it turns into a whole different situation.
As far as I understand, the basic idea is to cooperate with a probability that depends on whether your opponent cooperated in the previous round. (Actually there are 4 probabilities because the strategy also depends on whether you cooperated in the last round.)
For example, you might say that you will cooperate with prob 75% if I cooperated, or 25% otherwise.
By choosing the probabilities carefully this can turn the game into one where the more I choose to cooperate, then the better I do - but you will always outperform me (unless I choose to always defect when we both end up equally lost).
Feels similar to only proposing win-lose business deals. You will never "lose" but will soon run out of people willing to deal.
It's inspiring to note that Bill Press is 64 years old, and Freeman Dyson is 88! Here's a more technical explanation, gleaned from http://arxiv.org/abs/1208.2666.
Each prisoner dilemma round has four possible outcomes: CC, CD, DC and DD (C=Cooperate,D=Defect). We study probabilistic strategies which depend only on the previous round, so for instance, P(player 1 plays C at round i+1|CD was played at round i)=0.3. From the paper:
Such games can be described by a Markov process defined by the four probabilities that characterize each of the two player’s strategies [9] (because this is an infinitely repeated game, the probability to engage in the first move–which is unconditional–does not play a role here). Each Markov process has a stationary state given by the left
eigenvector of the Markov matrix, which in this case describes the equilibrium of the process. The expected payoff is given by the dot product of the stationary state and the payoff vector of the strategy. But while the stationary state is the same for either player, the payoff vector–given by the score received for each of the four possible plays CC, CD, DC, and DD–is different for the two players for the asymmetric plays CD and DC. Because the expected payoff is a linear function of the
payoffs, it is possible for one strategy to enforce the payoff of the opponent by a judiciously chosen set of probabilities that makes the linear combination of determinants vanish (hence the name ZD strategies). Note that this enforcement is asymmetric because of the asymmetry in the payoff vectors introduced earlier: while the ZD player can choose the opponent’s payoff to depend only on their own probabilities, the payoff to the ZD player depends on both the ZD player’s as well as the opponent’s probabilities. This is the mathematical surprise: the expected payoff is usually a very complicated function of six probabilities (and four payoff values, for the four possible plays). When playing against the ZD strategy, the payoff that the opponent reaps is defined by the payoffs and only two remaining probabilities that characterize the ZD strategies.
"Nevertheless, in a single game, the best strategy is to snitch because it guarantees that you don't get the maximum jail term. "
Huh? That isn't correct is it?
Snitching is best because it gives the highest payout regardless of what the other person does (snitching strictly dominates not snitching).
If the other does not snitch and you do, you go free.
If the other person does snitch, you should as well, because it gives you 3 months instead of 6.
Not really. The sentence says that the reason you defect is specifically to avoid the worst possible result. This is bad logic. The reason defect is considered dominant is because, regardless of what the other player does, your outcome is better if you defect. It is a very different statement.
Hrrm... I'm no game theory expert, but I thought that it had been known for some time that there were strategies which could beat "tit for tat" in IPD. But this article makes it sound like "tit for tat" was the state of the art until this particular discovery.
Guess I need to go back and do some more reading on the subject. This reminds me, I've been meaning to read The Evolution of Cooperation[1] forever, and haven't found time yet. sigh
I'm getting pretty tired of technology review extremely poor explanations for science discoveries. I only know vaguely more than I did when I started reading the article. When I got to the bottom I didn't know if I should be excited or not.
Except it is not difficult to prove that you can't win in expectation against a random RPS player. The programming competition is only fun because the players are nonrandom.
Nope, the random player ends up in the middle, on average.
If you'd join a tournament with only random players then your optimal strategy, actually doesn't matter because you're going to randomly win/lose (because your opponents always play random) and on average, end up in the middle rank, just like everybody else.
If there are other non-random players, then you can aim for the middle rank by playing random (and one of the other non-random players will do better than the other, and you will do worse than that one and better than the other). Or you can try to beat the non-random players (while still doing near 50/50 odds on the random ones).
So if you play random, and there is more than one non-random player, you will lose.
Also, check out some of the strategies. There's a couple of really well-performing ones that are incredibly clever, but not all that hard to understand.
One goes like this:
It's based on trying to predict the opponent's strategy and playing against that. But what if the opponent expects that and plays you against that? A-ha but what if you play against your opponent trying to play you?
You'd think that this reasoning can go on forever, but since there are only three moves in RPS, and they circularly defeat eachother, you only need to consider these three steps as the next one brings you to the move of square one.
The algorithm then basically keeps a running score of these three different depths of pre-emptive strategy, and picks the one that works best based on historical moves "what would have won" (and runs random while collecting data).
This algo did extremely well in the first couple of tournaments held many years ago. I don't know what the most popular strategy is currently though.
It just takes one nonrandom player in the pool of players for a more advanced strategy to gain an advantage. Consequently, it is almost certain that, given a tournament of an arbitrary set of RPS players, a very sophisticated RPS player will end up winning, whereas random players will always be relegated to mediocre results.
"This and similar "quant" nonsense is precisely what has led to the biggest uncollectable pile of derivative securities and assorted options of the too big to fail banking disaster we are currently confronting. All these gambling scenarios are counter productive to improvements necessary to increasing the world's real physical wealth. These studies are worse than useless, they are parasitic cancers on society."
This comment below the article made me smile. I kind of feel inclined to agree, even though it is of course too simplistic a view.
Do you actually think this comment is intelligent or otherwise improves the discussion others are having about this topic? For one, I am pretty much completely lost in terms of what this article is saying, and inane commentary does not help in the process of understanding.
The gist of the article is that a long held strategy has been found inferior. This has been hailed as a revolution by some (the author and no doubt some of the community he's writing about) while the longer view, which can be seen in the post's comments is that it's expected:
"This just seems to be an extension of the folk theorem to the family of probabilistic strategies. 1. Any level payoff level can be supported in equilbrium (Press & Dyson), 2. None of those equilbria survive evolutionary refinements (this paper). Interesting stuff, but not quite revolutionary as it is suggested. "
The reference I was making was a famous line from the Simpsons where Bart's strategy of "good old rock, nothing beats that" is adapted to by Lisa who picks paper. Yeah I misquoted, season 4 was a long time ago :)
No it probably wasn't the most intelligent or informed comment I could have made. Sometimes the not most intelligent comments can be useful. Maybe this one wasn't one of those but, /shrug I would probably make it again.
Here is a direct link to a PDF of a paper defining the Zero Determinant Strategy. http://arxiv.org/pdf/1208.2666v1.pdf I don't really understand it. Neither do journalists it seems. Can someone explain?