Predictably Intransitive Preferences

The transitivity axiom is common to nearly all descriptive and normative utility theories of choice under risk. Recent experiments claim to show observed intransitive preference cycles are no more than noise. We take issue with this consensus position and its normative defence of transitivity. We draw upon the ‘Steinhaus-Trybula paradox’ as a recipe to bespoke design pairs of lotteries over which preferences might cycle. We run an experiment to look for cycles and transitivity’s implication of expansion/contraction consistency between binary and ternary choice sets. Even after considering possible stochastic but transitive explanations, we find cycles can be the modal preference pattern over these simple lotteries and also find systematic violations of expansion/contraction consistency. We conclude with a defence of these preferences, including a novel argument against the money pump.


Introduction
The descriptive adequacy of Expected Utility Theory (EUT) has been questioned as an account of choice under risk at least since Allais (1953) presented his famous 'paradox' examples. One axiom of EUT has been questioned less than most: transitivity. Transitivity says if a decision maker prefers A to B and B to C, then he or she should also prefer A to C.
Choice cycles cannot occur except when a decision maker happens to be exactly indifferent between A, B and C or she makes a mistake. Many consider the logic of transitivity to set a benchmark of rational choice behaviour.
Descriptively also, the consensus is that intransitive preference cycles are vanishingly infrequent. Evidence once taken to indicate systematic intransitivity (Tversky, 1969;Loomes, Starmer & Sugden, 1991) has since been either reinterpreted as not reflecting fundamental intransitivity or found by newer statistical methods to be compatible with noisy but transitive responses (inter alia: Starmer and Sugden, 1993;Baillon et al, 2015;Birnbaum & Diecidue, 2015). A representative example is Birnbaum & Schmidt (2008) who conclude: "…we think the burden of proof should shift to those who argue that intransitive models are descriptive of more than five percent of the population".
In light of this consensus, normative and descriptive, in support of the axiom, consider the following three statistically independent lotteries: A yields $4 with probability 2/3 and $1 with probability 1/3; B yields $3 for sure; and C yields $5 with probability 1/3 and $2 with probability 2/3. Let us consider a decision maker (DM) whose preference is to maximise the probability of receiving a higher monetary amount. Given this preference in a direct binary choice for the most probable winner, the decision maker chooses A over B because A yields a higher outcome than B with probability ⁄ . Similarly, between B and C, the decision maker chooses B over C because B yields a higher outcome than C also with probability ⁄ . Finally, in a direct binary choice between C and A, the decision maker chooses C over A because C yields a higher outcome than A with probability ⁄ , or 55.5%. These preferences illustrate the cycle A B C A, in violation of the transitivity axiom.
This type of cycle is an illustration of the Steinhaus-Trybula paradox (STP: Steinhaus & Trybula, 1959). The paradox can be stated as follows: Let choice objects A, B, C be independent random variables and let Pr(A>B) denote a probability of choosing A over B. It is possible for Pr(A>B), Pr(B>C) and Pr(C>A) to all exceed 50%, given preference for the winner.
In the above example the smallest of the three 'winning' probabilities was 55.5%. Steinhaus & Trybula (1959) showed that for three choice objects in the STP given probable-winner preferences, each with three attributes, the theoretical maximum 'minimum' (max-min) winning probability while preserving the cycle is √ or 61.8%. 1 The STP has passed mostly unnoticed in the decision theory literature (exceptions include Butler & Hey, 1987;Anand, 1993;Blavatskyy, 2006;Rubinstein & Segal, 2012). Of these, only the latter two go beyond a single example to explore implications for decision theory. As a mathematical puzzle the STP is more widely known and has inspired a small but ongoing literature in applied statistics 2 .
Given the STP relies on a rather unusual, even extreme, preference for the most probable winner, does it have much relevance for decision theory? 3 This paper raises the possibility that the answer is yes: we conjecture that the STP can serve as a recipe book to guide selection of the parameters for cyclical choices over lotteries.
In the spirit of Allais' famous example, consider one such 'bespoke' lottery set. Assume your preferred lottery will be played out for real money. The three choice objects are statistically 1 Intriguingly, this value is the conjugate of the Golden Ratio, φ. 2 Not by this name; usually simply 'intransitive dice'. The dice are reputed to be a favourite puzzle for persons such as Warren Buffet and Bill Gates. 3 The STP also demonstrates a contradiction in the popular Wilcoxon-Mann-Whitney rank-sum U test: it can show A to be significantly stochastically larger than B, B larger than C and yet also that C is larger than A. We thank Nick Feltovich for this observation. If you prefer Y X, Z Y and X Z you have exhibited the preference cycle X Z Y X, along with the modal individual in our experiment. The opposite cycle is X Y Z X; we found these two intransitive patterns together exceeded, by a small majority, the six transitive patterns combined.
But as Regenwetter et al (2011, p.414) explain, intransitive behaviour could be highly undesirable: "...if preferences are not consistent with strict weak orders, then we may have to give up modelling choice through numerical representations. This would have far-reaching consequences, for example, in modelling economic behaviour".
We seem to have reached an apparent contradiction. How might we reconcile a strong consensus for the transitivity axiom with the simple counter-examples offered here? We will show that notwithstanding the compelling logic of the transitivity axiom, preference cycles involving simple lottery pairs can be found which are very unlikely to be just noise. Indeed we go further; we will offer a normative defence of these cycles. We will see that in doing so surprisingly little of existing economic theory's predictions need change, rather as much of Newtonian physics remains after the relativistic revolution. It is in the implications for how economists think about the underlying process of choice under risk and how best to model it that the deeper conceptual implications of our results will be found.
While our attention in the current paper will focus on the abstract world of preferences over simple lotteries, the relevance of these objects to real economic decisions involving choice under risk should not be overlooked. Steinhaus & Trybula (1959) give an application to testing the relative strength of randomly selected steel bars A, B, C, for which successive comparisons could exhibit a cycle with each bar stronger than the next but the last being stronger than the first. For another example not reliant on 'winning', suppose the consequences on each of X, Y and Z instead refer to historical frequencies of wheat harvests of different sizes for each of three farms. A buyer's binary preferences over these farms might cycle as our subjects did over lotteries X, Y and Z. Many other applications of choice under risk could be substituted here. Preferences may also cycle when the attributes are measured in different units, such as if X, Y and Z each represent a different crop. Or they might be health-state ratings for various dimensions of health (mobility, depression, pain, etc.) 4 over which a patient's preferences may cycle.
To be clear, we will not be testing experimentally the STP with its induced preference for the most probable winner. Given the preferences induced by this rule, everyone should cycle for all collections of statistically independent multi-attribute risky objects (random variables) fitting the STP criteria, save for an occasional 'transitive error'. Our goal is to identify the structure of choice objects which satisfy the STP, then investigate if that structure can help select lottery parameters over which preferences may cycle when 'probable winner' preferences are not induced. We conjecture that there is a class of lottery parameters where individuals who typically obey transitivity may exhibit preference cycles. If we are correct, the implications for utility theories are potentially far-reaching. We design an experiment to investigate.
The rest of the paper is organised as follows. In Section 2 we discuss a number of theoretical issues and how the STP fits in. In Section 3 we look for lessons for bespoke designing lottery pairs for our experiment. In Section 4 we present the results of the experiment. Section 5 addresses the fundamental question of the money pump and Section 6 concludes.

Theoretical Issues
2.1 How can 'True Intransitivity' be Possible?
In his highly influential 1969 article 'Intransitivity of Preference', Amos Tversky lamented "…in the absence of a model that guides the construction of the alternatives, one is unlikely to detect consistent violations of weak stochastic transitivity (WST)". To date, researchers have taken different paths to search for preference cycles: by the selection of subjects based on their tendency towards intransitivity (e.g., Tversky, 1969;Ranyard, 1977); the use of individually tailored stimuli (e.g., Baillon, et al, 2015) or even the use of finer-grained 'strength of preference' measures to discern latent intransitive tendencies (Butler et al, 2013) none of which found convincing evidence. For example, Tversky's evidence can also be explained by mixtures of transitive preferences; see Iverson & Falmagne, 1985;Birnbaum & Gutierrez, 2007. For objects evaluated jointly, evaluation 'within-object' satisfies transitivity simply because it (implausibly) ignores all between-object comparisons, or contrasts, when identifying one's preferences. But even under comparative evaluation of any form, intransitive cycles cannot occur with fewer than three attributes (see proof in Tversky (1969); he notes this was first proved by Morrison (1962)). Our simple example for objects {A, B, C} in Section 1 is a minimalist illustration of the STP for which preference requires between-object comparison of attributes. The transitivity axiom of EUT makes no reference to how individuals decide one object is preferred to another. Because this restriction is not acknowledged explicitly, the axiom implicitly assumes any between-object comparisons must take a particular and restrictive form to avoid potential intransitivity or the resulting decisions need not necessarily satisfy transitivity (see Tversky, 1969;Fishburn, 1981;Loomes and Sugden, 1982). Evidence from eye-tracking experiments shows clear evidence for individuals' tendency to make 'between-object' valuations (Russo & Dosher, 1983;Arieli et al, 2009;Noguchi & Stewart, 2014); leaving this implicit assumption problematic.
In his recent book Kahneman (2012) observed that "The errors of a theory are rarely found in what it asserts explicitly; they hide in what it ignores or tacitly assumes". With reference to the STP definition, the implicit assumption in the transitivity axiom for choice under risk requires the largest 'minimum' probability of winning, across the three binary comparisons, to not exceed 50%. Therein lies the flaw in the use of the axiom; our earlier illustration demonstrated a max-min winning probability of 55.5% and the theoretical limit has been proven to be 61.8%. By implicitly assuming a limit of 50% other erroneous assumptions follow, from the presumed irrelevance of the deliberation process to imposing consistency conditions on preference rankings as the contents of the choice set change. An important lesson from the STP is that utility theories using the transitivity axiom in their preference representation need to address these restrictions on its applicability; currently, they do not.

Contraction and Expansion Consistency
Between-lottery evaluation allows preferences to depend in part on the nature of the objects in the choice set which are not subsequently chosen. Unlike 'within-lottery' evaluation, we need therefore to specify our preferences afresh for each new choice set in which an object appears, including for any subset of a choice set for which our preference was previously revealed. If our choice were to change in the subset, our decisions would violate contraction consistency. Analogously expansion consistency is violated if a binary preference ranking reverses when the pair is embedded within a larger choice set.
These consistency conditions are better known, albeit slightly misleadingly, as the corollary Although choice is clearly stochastic, and while apparently intransitive cycles may arise due to error, distinguishing structurally intransitive latent preferences from stochastically satisfied transitivity in experiments is not straightforward. We begin by asking how frequent can intransitive cycles be for individuals with transitive preferences who choose probabilistically.
For example, individuals may have core preferences that are transitive but choice probabilities are determined by embedding these preferences into a model of random errors (e.g., Blavatskyy, 2014). Such a modelling approach can generate a statistically significant asymmetry between two possible intransitive patterns (so-called "regret" and "probable winner" cycles) but it can generate intransitive cycles only up to a limit of 25% of all observed choice patterns, for any triple.
A more promising model of probabilistic choice for rationalizing intransitive cycles as stochastically transitive preference is the random preference approach (e.g., Loomes and Sugden, 1995). For an extreme example, let us consider an individual who has three transitive preference orderings X>Y>Z, Z>X>Y and Y>Z>X with each ordering equally likely to be drawn when a choice is to be made. It is straightforward to see that in a direct binary choice between X and Y, this individual chooses X with probability 2/3. Likewise, in a direct binary choice between Y and Z, this individual chooses Y with probability 2/3. Finally, in a direct binary choice between X and Z, this individual chooses Z with probability 2/3, thereby violating weak stochastic transitivity. Thus, a model of random transitive preferences generates a maximum of = 8/27 (29.6%) intransitive choice cycles. This limit involves a strongly significant asymmetry between the two possible intransitive patterns; the maximum frequency of a particular cycle given random sampling is ¼; see Rubinstein & Segal (2012) for proofs of these propositions. However, a model of random transitive preferences has another testable implication so far overlooked in the literature, given its focus on binary choice sets. When comparing binary choice data with the choice data from a ternary set, we find a new set of constraints that any stochastic but exclusively transitive preferences must meet. In such models of stochastic choice, the probability of choosing X from the ternary set {X,Y,Z} is given by the probability that a decision maker draws a preference order in which X is preferred to Y and X is preferred to Z. In contrast, for a direct binary choice between X and Y, this decision maker chooses X with a probability that is equal to the probability that he or she draws a preference order in which X is preferred to Y (but X may or may not be preferred to Z). Similarly, for a direct binary choice between X and Z, this decision maker chooses X with a probability that is equal to the probability that he or she draws a preference order in which X is preferred to Z (but X may or may not be preferred to Y). Hence, any model of random transitive preferences must make the following testable hypotheses. If any one of the three hypotheses fails to hold, no model of stochastic transitive preferences can explain that data. Since, by definition, the probabilities of choosing X, Y and Z from the ternary set {X,Y,Z} must sum up to one, we have the following implication of the model of random transitive preferences: Thus, a decision make who violates weak stochastic transitivity so that P(X,Y)>0.5, P(Y,Z)>0.5 and P(Z,X)>0.5, must still satisfy the inequality which can be simplified as a triangle inequality P(X,Y) + P(Y,Z) + P(Z,X) < 2 The triangle inequalities (7) and (8) are sometimes proposed to separate genuine intransitive cycles from random preference predictions. However, Birnbaum (2011) showed that the triangle inequalities may be satisfied even by underlying preferences which are intransitive. (2015) demonstrates how these inequalities may be violated even when underlying preferences are 100% transitive. Their experiment also shows clear violations of these inequalities.

Furthermore, recent work by Muller-Trede et al
In other words, the triangle inequalities for stochastic transitive preferences can be satisfied when preferences are intransitive and violated when preferences are transitive, raising a concern that they are not as useful for identifying true intransitive preference cycles as generally believed, (e.g., Birnbaum and Schmidt, 2008) though see Cavagnaro & Davis-Stober (2014) for an alternative view. For these reasons, among others, our experiment was not designed specifically to test the triangle inequalities, which ideally would require multiple repetitions of the same lottery pairs for every person; we include just two repetitions.
However we can and do test the novel implications for choice proportion contrasts between the binary and ternary choice sets in Section 4, for each triple which does not satisfy WST.
We follow Birnbaum & Diecidue (2015) and repeat each set of choices once; this allows us to identify error rates and use the 'true and error' model as another way to separate underlying preferences from noisy responses. Repetition of the full set of decisions after a distractor task also facilitates other methods of separating noise from true preferences, which we also discuss in Section 5.
Finally, and to pre-empt misunderstandings we note that the STP is distinct from the wellknown 'asymmetric dominance' phenomenon, popular in the marketing literature (e.g., Huber et al, 1982). The idea there is that the addition of an option dominated by the less preferred of the original two objects, can sometimes lead people to switch towards the less preferred of the original objects in the context of the larger choice set. Formally, in the choice set { } suppose more people prefer to (i.e., ). However, when an inferior option ̃ is introduced such that and only dominates ̃ (i.e., ̃) , more people then choose from the set { ̃} than did so from the set { }, violating IIA. Think of an apple, A, an orange, B, and a bruised apple ̃. As ̃ would not be chosen over in any binary choice, this is a different violation of IIA than that of the STP; none of the choice sets we use in our experiment make use of asymmetric dominance.

How often does the STP occur?
So far we have drawn upon just one example of STP using three sets of integers. If it were the only possible 3-attribute set of integers to result in a preference cycle, the STP would arguably be little more than a mathematical curiosity. We therefore need to acquire a sense of how rare such cases are for typical risky decision problems. For three-attribute objects the set of possible triples increases with the range of integers, N, that can be used to represent consequences. From a decision theory perspective, the set of possible triples drawn from, say, integers 1-10, include a large majority that are simply rearrangements of the same underlying gambles. Eliminating rearrangements but allowing integers to be reused within each object (though to avoid ties, not reused across the three objects), the formula we derived is: This set is suited to guiding parameter selection for experimental testing because reuse of integers on the same choice object reduces the cognitive burden for a clearer picture of underlying preference structures. Ranges of n are shown in Table 1, column 1; values of x are shown in column 2.
Equation (2) is, we subsequently learned, the set known as 'double tetrahedral' numbers 6 for the relation between integer range and size of the sample space. The result applies generally for the number of unique gambles that can be constructed from three attributes on each of three choice objects for a given range of integers. As there is no formula to calculate the number of intransitive triples from this set, we need to test each of the triples in column 2 for 6 A web search subsequently identified this set as integer sequence A140236 in the Online Encyclopaedia of Integer Sequences. STP individually. Developing and programming a code to achieve this in Matlab 7 produced the results shown in Table 1, column 3. can be much larger on occasion. Calculations for these larger sums quickly become impractical; suffice to note that the proportion of intransitive triples increases in the range of integer values up to an asymptotic limit 8 ; Table 1 shows that for 1-10 it is 1:464, for 1-20 it is 1:155 and for n=1-∞ it is 1:112. It is this latter figure that offers the best 'ballpark' answer to the question of how often these risky objects are likely to fall within the STP. 7 The MATLAB code is available from the authors on request. 8 We thank Igor Kopylov for showing the limit for this proportion is .

Range of integer values, n
Number of possible triples (double tetrahedrals), x

% STP
But only a part of even this sliver of parameter space is involved in these cycles; Table 2 gives a sense of the structure of these objects. The 12 triples using integers 1-6 from Table 1 are shown; the final row of Table 2 is the unique example we gave in Section 1. 9 Crucially, we see that the 12 sets have similar ingredients to each other, furthermore these patterns continue for the 3,872 intransitive sets for 1-10 integers and beyond. One reason why experiments to date have only rarely found evidence of intransitive behaviour is, as Tversky lamented, lack of knowledge of the more likely locations of suitable lottery parameters in parameter space. The choice of parameters for any three lotteries would stray into the 'black hole' of the STP no more than once in a hundred occasions, were they distributed randomly, which is why most predictions based upon transitivity remain unaffected. But with this new knowledge we may choose to navigate to this destination via bespoke design, as we will do in Section 4.
Finally, we note that the maximum smallest margin of victory in any cycle increases with the number of choice objects (to an asymptotic maximum of ¾ for a very large number of choice objects), suggesting a larger intransitive proportion for sets of four or more choice objects.
Getting close to this upper bound also requires more attributes on each object. Limiting the number of attributes per object to 6, the upper bound is ⁄ for an arbitrarily large number of choice objects, only slightly greater than √ . These results are long established in the mathematical literature; see Usiskin (1964) for more details. Although the frequency of STP can rise slightly above 1% in some of these more general cases, we restrict our investigation to three choice objects of three attributes each as the simplest way to make our points.
Cognitive limitations suggest no more than a few attributes on each of several objects are typically considered together, so the more complex cases have few implications for decision theory.

Bespoke Intransitive Lottery Ingredients
To keep the decisions accessible and reduce complexity we focus on the subset of parameter sets meeting the STP that use no more than two different integers per object. To avoid ties no integer is repeated across objects in any of our triples. The probability of each outcome in our design is always 1/3, 2/3 or 1. Furthermore each consequence represents a sum of money, in £, a very familiar, directly comparable outcome for which magnitudes are easily interpretable by our participants.
What are the other key ingredients identified from the set of objects meeting our conditions?
For convenience we use the 3-attribute case and integer range 1-10 as the recipe book for our experimental design. In this case there are 3,872 unique intransitive triples (see Table 1), a number small enough for visual inspection but large enough for meaningful variation.
Expected value differences between each choice object within a given triple for this set are found to be at most 8 ± 1⅔, or ± 20.875%. That the expected values can differ non-trivially among the choice objects in intransitive triples is another new discovery and also a feature we will exploit in our bespoke designs.
Inspection of these 3872 triples reveals that one, in Table 2 'Y', is typically degenerate, or at least nearly so, resembling a 'certainty equivalent' of some lottery. 'X' resembles the 'P-bet' made famous from preference reversal experiments by Slovic and Lichtenstein (1971) and 'Z' resembles the $-bet from the same experiments. These resemblances are another important inductive inference from the STP sets to the bespoke design of our lotteries. It may be no coincidence why the preference reversal phenomenon was observed originally, if the typical parameters of those bets resembled, presumably unintentionally, STP objects.
But the lotteries X and Z also have one important difference from the original P and $-bets.
Objects satisfying the STP have a larger minimum consequence for Z than for X; conventionally in the preference reversal literature these are either equal (at zero) or the $-bet has the lower minimum consequence. Together, these two insights will be key ingredients of the STP recipe we will translate into the design of lottery pairs played under standard incentives.
As noted earlier, expected values of the STP objects comprising a triple need not necessarily be similar amounts. Taking advantage of this fact, we mimic the CE, $ and P-bets from the preference reversal literature, choosing most of our lottery parameters from the (minority of) STP sets which had expected values in the following order: $ > P > CE, i.e., Z > X > Y. We will use the latter notation for these modified 'PR' bets.
By requiring that each of our objects reuse integers and that the expected values take on a particular ranking, we greatly reduce the set of 3,872 objects satisfying the STP with which to inform the selection of our lottery parameters. The most relevant possible consequence of our decision is that the direction of cycles may follow that seen most often for preference reversals ('regret' cycles; see Loomes & Sugden, 1982), which is the opposite direction to the STP and its 'probable winner' (see Blavatskyy, 2006) cycles. While this is our choice of focus, future researchers may make different selections from the set of STP objects to guide parameter selection in subsequent work. The lotteries used are shown in Table 3.
For any experimental test of intransitivity it is important to keep the presentation of the number of attributes in each object equal for each choice object, rather than coalescing identical outcomes when they arise. This is because past experiments have found the contrast between coalesced and non-coalesced outcomes (also known as event-splitting; Starmer and Sugden, 1993) can be confused as evidence for intransitivity (Birnbaum & Schmidt, 2008;Baillon et al, 2015). In our experiment we keep the number of attributes presented on the choice objects constant even when all three attributes lead to the same outcome.
Earlier tests for preference cycles primarily used state-contingent consequences in matrixstyle displays. These displays were thought to facilitate between-act comparisons and enhance the possibility of, for instance, anticipated regret when consequences are statecontingent and with it the potential for cycles. Our design will maintain statistical independence between the choice objects such that any observed preference cycles are more likely to be rooted in description-invariant, intransitive underlying preference orders. We also avoid constructing triples from objects with equal EV's for which motives such as regret or preference for the winner might tip the balance towards a cycle even if the motive is very weak.
Finally, we include a 'standard PR' control set to compare to the STP-modified PR gambles that are the main focus of our experiment. Another innovation in our design is to allow the expression of weak and strong preference for the chosen lottery. This distinction may help in interpreting whether any observed cycles are based on weak, perhaps noisy or vague preferences, or whether they are held as firmly, or more firmly, than the transitive patterns.

Experimental Set-up
A total of 100 participants (all undergraduate students at the University of Warwick) were invited to take part in the experiment. The procedure was programmed using the Qualtrics software and consisted of 100 questions divided into 5 parts (see Figure1 below).

Figure 1: Experimental Flow
In Part 1, we disentangled the 11 triples into binary choices between individual lotteries and asked participants to answer 33 questions (3 binary choice questions per each triple). Table 3 provides a detailed list of all triples. Each binary choice was presented in the format shown below in Figure 2 with two options -Left and Right options. 10 Each option represented a lottery with 3 equi-probable outcomes. 10 Detailed experimental instructions as well as further screenshots are provided in the Appendix. starting point for each slider was "No preference". However, participants were not able to proceed by leaving the slider in the original position (i.e., the choice of "No preference" was not allowed). 11 Participants were able to move the slider to the right and opt for "Slightly prefer Right" or "Strongly prefer Right" or, alternatively, to move the slider to the left and choose "Slightly prefer Left" or "Strongly prefer Left". Irrespective of whether a participant indicated slight or strong preference, we used only revealed preferences for "Left" or "Right" in the payoff calculations. All 33 questions were randomised for each individual separately.
In Part 3, we repeated all 33 binary questions again but presented them in a different random order to each participant.
In Part 2, participants were asked to rank the lotteries in each ternary set from most preferred (1) to least preferred (3), for each of the 11 triples (see Figure 3). We sought to maximise the similarity to the choice task and so did not use the Becker-DeGroot Marschak incentive mechanism. Instead we incentivised the choice of most preferred and the choice of next preferred in each ternary set to obtain the ranking. We use the standard 'random lottery incentive system' in our experiment. The order in which the ternary sets appeared was randomized, as well as the order in which the lotteries appeared on each screen. To avoid lazy acceptance of the default ordering, it was not possible to simply accept the default ranking. If the default was preferred they first had to move away from the default then move back to it by deliberate choice. In Part 4, all 11 ternary choice set problems were repeated in a different random order.

Figure 3: Ternary Choice Display
In Part 3, participants were offered a Distractor Task in order to create a break between the two repetitions of binary choice and ternary choice tasks. The distractor task consisted of 12 questions, repeating questions from three of the 11 experimental triples but presented as a simple choice between two or three state-contingent lottery options. The order of these 12 questions was randomized for each participant and can be seen in the Appendix.
Participants were asked to complete the experiment online (in their own time) and were given a 5-day window to complete all tasks. 12 All 100 participants were then invited to the laboratory to play out their decisions for real money. Each participant drew a question number (between 1 and 100) at random and received payment based on his/her choice in that question. In each question, we looked at the lottery option chosen by the participant and played out that lottery according to the description on the experimental display (see, e.g. Figure 3 and Figure 4). Immediately after the draw, participants received their payoff in cash.
Finally, all participants completed a detailed online survey covering questions such as domain-specific risk attitudes and a variety of demographic variables not reported here. Table 4 reports the frequency of intransitive cycles for each of the eleven triples. Other than the control (triple 9) we find the proportion of preference cycles averaged across both repetitions ranges from a low of 18% (triples 2 and 6) to a high of 59% (triple 4). In contrast the PR 'control' triple found just 6% intransitive patterns. The average proportion intransitive in the first block was 27.1% followed by 25.8% for the second block, giving an overall proportion of 26.5%. If learning occurred between blocks it did not reduce the occurrence of cycles noticeably, giving our first clue that error may not be the main cause of cycles. The modal response pattern was an intransitive pattern for three of the ten triples and an intransitive pattern was runner-up in a further five triples.

Cycle Frequency
At the individual level, between 30% and 85% cycled in each of the ten triples either once or on both repetitions. For instance in triple 4 no fewer than 85 of the 100 individuals cycled at least once while 34 cycled both times. Across the ten triples, every single one of the 100 participants cycled at least once. A Spearman correlation of the number of cycles by individual between repetitions was +0.93, suggesting again that the cycles are not simply random error. Figure 5 shows the histogram of cycles by individual. It appears that a 12 The majority of participants completed the experiment in 30-40 minutes. significant minority, a plurality, even an occasional majority, exhibit intransitive choice cycles, even for these statistically independent pairs of simple and incentivised lotteries.
One other way to check whether fundamental intransitivity or noise is driving the data is to divide the participants into two equal-size groups by rate of binary choice switching between the two repetitions. Doing so reveals 51 individuals with 10 or fewer inconsistencies and 49 with 11 or more inconsistencies (or at least, stochastic preferences). Figure 6  This difference is further evidence that true intransitivity rather than noise is responsible for most of the observed cycles. The caveat is there may be a modest uptick of cycles for the most inconsistent of all; but inspection of the graph shows the uptick is driven by just two out of 100 individuals, so it may not be reliable.
The predominant direction of cycles in this experiment is consistent with that sometimes named 'regret' cycles, rather than the 'probable winner' cycles induced by the STP. The 26% average breaks down 19:7 in favour of the 'regret' direction. For the eight triples where an intransitive pattern is either the mode or runner-up, six follow the 'regret' direction and two the 'probable winner' direction. This likely reflects our choice of expected value rankings for the lotteries we used. Unlike the STP, the random lottery incentive system does not impose a particular preference pattern; it elicits preferences rather than induces them.

Noisy but Transitive or Noisy and Intransitive?
There is no consensus currently on how to conclusively separate noisy mixtures of transitive preferences from true underlying intransitive preferences; one school focuses on binary choice probabilities, the other on choice patterns. WST and the triangle inequalities are the preferred tests for the former group, the 'true and error' model or other tests focussed on preference patterns and their consistency are the preferred choice for the latter group. Our preference is closer to the latter group, following the arguments of Birnbaum (2013) and Birnbaum & Diecidue (2015). Consequently we repeated all the binary and ternary choices in every triple after a distractor task, and focus mostly on consistent preference patterns rather than binary choice probabilities.
Are the two intransitive patterns more or less likely to replicate than the six transitive patterns? A close approximation to a true, 'error free' proportion for each preference pattern is to identify the number of subjects making the same three binary choices within a triple on both repetitions. To do so means avoiding six possible choice errors, for each 'true' preference pattern. Across the ten sets of triples (excluding the control) this occurs 366 out of 1000 times. Of these, 117 were of one of the two intransitive orderings and 249 were for one of the six transitive orderings. Thus, the share of revealed consistently intransitive preference patterns among all revealed consistent preference patterns was 32% (117 out of 366). To get a sense of how striking this finding is, a recent and unusually careful and thorough investigation of intransitive choice patterns was able to conclude: "…very few people repeat the same intransitive pattern on two replications of the same test. In other words, most violations that have been observed can be attributed to error rather than to true intransitivity" (Birnbaum & Diecidue, 2015).
Delving deeper, we see that the intransitive proportion of consistently revealed patterns across triples varies from 17.8% (triple 8) to 83.9% (triple 4). The modal consistently revealed patterns are intransitive for triples 4, 7, 10 and 11 and runner-up for triples 1, 3, 5 and 6; that is, eight of the ten triples have an intransitive modal or second modal consistently revealed preference pattern. So, consistently intransitive preferences appear to be revealed relatively more frequently than intransitive preferences that are not necessarily consistent.
This result suggests that noise diminishes (rather than increases) intransitive preferences in revealed choice patterns. In other words, as the noise washes out, cyclical choice patterns increase their share of the total. As a comparison, in the control triple we find the opposite: just 1 consistently intransitive person but 51 consistently transitive people, a 98% transitive share, compared with 94% in that triple for all revealed preferences.

WST and constraints on preferences in ternary choice sets
Weak stochastic transitivity (WST), in which Pr (C > A) must be at least as large as the minimum of Pr (B > A) and Pr (C > B), was violated for Triples 1, 3, 4, 5 and 7; see Table 5.
Triple 4 exhibits the strongest violation followed closely by triple 3. Averaged across both repetitions, in triple 4 we found: Pr (Y > X) = 71.5%; Pr (Z > Y) = 64.5% and also that Pr (Z > X) = 27%. To our knowledge, at 37.5 percentage points, this is the largest magnitude violation of WST so far demonstrated in the lottery choice literature. It is also a violation of 'simple scalability' (Tversky and Russo, 1968).
As noted earlier, violations of WST can also result if participants have random preferences over exclusively transitive preference orders. But we also showed an implication of this claim is that the frequency with which each lottery can be chosen from the respective ternary choice set must satisfy the constraints we derived in equations (3)-(5). In other words, any random preference model over transitive orderings that may violate WST must also satisfy all three of these constraints in the ternary sets. If they do not, latent intransitive preferences become still more the likely explanation.
Looking first at triple 4 and with reference to hypotheses 1-3, the binary choice probabilities of Pr (X > Y) = 71.5%; Pr (Y > Z) = 64.5% and Pr (Z > X) = 27% impose constraints on the proportion of times each lottery is most preferred from the ternary sets of Pr(X) < 0.27, Pr(Y) < 0.285 and Pr(Z) < 0.355. Our results in Table 6  In summary, every triple in which WST was violated for the binary choice sets failed to satisfy the constraints on choice frequency in the ternary sets that a model of random preference over transitive orderings must satisfy if the WST violations were a result of stochastic but transitive preferences. Taken as a whole, the tests reported above to separate noisy but transitive latent preferences from underlying intransitivity lean heavily in favour of the latter proposition.
Finally, triple 8 was one of two triples where intransitive patterns were relatively infrequent.
The lotteries comprising triple 8 were chosen to be a test of one intriguing 'ingredient' in the STP recipe: a higher minimum consequence for Z than for X. A small pilot experiment had previously identified triple 4 as particularly prone to exhibit cycles (19 of 27 participants cycled). We decided to make as few changes as possible to the lottery pairs of triple 4 when swapping the lowest payoff in X with that in Z. This change then required an increase in the maximum payoff in Z to keep the expected value above that for X. The combined effect of these two changes is to drastically reduce the number of observed cycles, from 119 of 200 in triple 4 down to 42 of 200 in triple 8. Even more drastic is the reduction in the intransitive share of consistently revealed patterns, from 83.9% to 17.8%. While a perfect test of the one ingredient isn't possible, this striking result suggests a key part of the recipe for cycles is keeping the lowest payoff of Z above that of X, a finding that may influence the future design of lotteries in preference reversal research more generally.

Testing Expansion Consistency
We also elicited the most preferred and second preference objects in the respective ternary choice sets to test for violations of IIA. While the discrepancy in the first two binary-ternary comparisons could plausibly be put down to unusually noisy responses, this is not a credible explanation for the third comparison.
Those choice reversals appear to reflect a large proportion of choice-set dependent preferences, in violation of expansion-consistency, as we predicted in Section 2. Adding X to the binary choice between Z and Y leads just over 75% of individuals who preferred Z to Y in the binary set, to now rank Y above Z in the ternary set, contrary to IIA 13 .
Aggregating across the 10 sets of triples (excluding the control) and being consistent with respect to the objects referred to by these letters, we find 993 binary choices of Z > Y; of these just 469, or 47.2%, maintain that rank in the respective ternary comparisons. For comparison, the binary choices in repetition 1 were consistent with binary choices in repetition 2 for 2043/3000 or 68.1% of decisions. For the binary preference Y > X, 699 of 1204 decisions maintain this ranking in the ternary set, or 58.1%. For those choosing X > Z in the binary set, 852 of 1240, or 68.7% maintain the ranking in the ternary set:. The latter is very similar to the consistency of the X > Z binary preference between the two repetitions, but the other two binary comparisons show clear evidence of choice-set dependent preferences, inconsistent with expansion to or contraction from the ternary set.
It might be argued that the violations of IIA predicted in Section 2 are properly tested only for the triples of binary preferences which produced cycles. Across the 11 triples (including the control for this test) there were 155 'probable winner' cycles and 386 'regret' cycles. For the regret cycles, in Section 2 we argued our version of the $-bet is preferred to the CE, but the order reverses when the P-bet is added to the choice set. The data reveal that in 237 of the 386 comparisons (61.4%), this binary ranking is indeed reversed in the ternary set. For the probable winner direction, we argued the preference for the $-bet over the P-bet will reverse when the CE is included; we find this to be true on 51 of the 155 occasions (32.9%). It is not immediately apparent to us why the choice-set dependence is so much stronger for the regret cycles.

The Money Pump needs Expansion Consistency
If a series of comparative evaluations were more likely to result in the selection of less preferred final outcomes, they might still be deemed irrational. The best known such criticism of intransitive preferences is the famous 'money pump' or a 'Dutch book', originally presented by Davidson et al (1955, p.146). Suppose your preference order is C>A>B but where B is in turn preferred to C. The money pump argument presents the following scenario. Assume you are endowed with B while I have A and C. I offer you A in exchange for (B + ε) and you accept; I then offer C for A on the same terms; again, you accept; then I offer B for C on the same terms; you accept once again. Your endowment is now (B -3ε) which is dominated by the B you began with. Nor does it stop here; the sequence of offers is repeated and accepted until you are left with B but no wealth.
Commenting on the money pump, Fishburn (1991) observed: "It is a clever device, but one that applies transitive thinking to an intransitive world". The world of the STP is an intransitive world, by construction, and the money pump fails. Using our previous example, begin with sequential pairs: given the choice of {A, C}; C is chosen over A (C wins ⁄ ).
Now offer a new option, B, to play against either A or C, also for a fee ε. We accept, choosing B to play against C. From {B, C} we win ⅔. Next, offer A, again also for a fee.
This time we decline; {A, B} also wins ⅔, which is no better than our current choice of B over C. So why swap and pay a fee ε? This defence however is dependent on the payment method being as prescribed by the STP; if instead we play out the chosen lottery, in isolation, in accord with usual practice, this particular defence will not work.
A number of earlier authors have offered a variety of replies to the money pump argument under the standard random lottery incentive system (Anand 1993;Loomes & Sugden 1987;Cubitt & Sugden 2001). The nub of their arguments is that when interpreting their decision environment, a repeated money pump requires the decision maker to have no memory of previous trades and no expectations of future trades. In this highly contrived scenario a DM can be pumped to bankruptcy. Otherwise they will decline all trades after just one cycle due to the DM constructing retrospective and prospective choice sets. This was the defence used for 'regret theory' (Loomes & Sugden, 1982, however the defence left open which of the three lotteries the DM would, or should, prefer and stick with after one round under the pump.
Our preferred argument against the money pump builds on these earlier ones as follows.
Notice the way in which the money pump story relies upon transitive thinking; it does not explicitly specify the perceived choice-set for each decision. Because the transitivity axiom 'tacitly assumes' contraction and expansion consistency, the money pump proceeds as if these choice sets do not need to be spelled out. The money pump implausibly requires the choices to be perceived and ranked as sequential binary sets rather than as a ternary set after one full sequence of offers.
But a decision maker with intransitive preferences should perceive a series of binary trades offered to him/her as a transformation of the choice set from binary sets (at each point in time) to a ternary set (across a short period of time). Remember we have seen that a preference from a ternary set is decidable when all three objects are in competition for selection just as each binary set was decidable, but we have shown contraction and expansion consistency need not hold between these sets. Given this, the decision maker should then choose their most preferred element from the ternary set; once that object has been purchased they will stop the cycle. Such a DM could be pumped of at most 2ɛ, one-third of the time, if two trades are needed to return to her top preference from the ternary set. The expected value of the pump is actually just 1ɛ and certainly would not lead to bankruptcy! This would constitute less of a 'Dutch book' than a maximum of two 'Dutch letters', clearly insufficient to attract an arbitrageur.

Conclusion: Predictably Intransitive?
In summary, where do our arguments leave the most popular theories of choice under risk, the bedrock for many models of economic behaviour, which assume transitivity? The Steinhaus-Trybula Paradox suggests that if three or more choice objects have at least three attributes, which are evaluated in binary and ternary choice sets rather than separately, intransitive preferences can be descriptively and normatively valid. But the domain of the STP is tightly restricted. Indeed, rather than needing to discard all transitive models, one implication is instead simply to take more care in deciding where to apply those models, as the domain of application is substantial but not, as we have shown, universal. Our main contributions are to flesh out the conditions supporting the STP, identify sets of choice objects that meet it, then use the findings as a recipe to construct lotteries typical of the decision theory literature, under standard preference elicitation incentives. Both the resemblances of STP objects to PR lotteries, as well as the differences from them, were an important discovery.
An innovative feature of our experiment was eliciting preferences in the ternary choice sets as well as the binary sets to investigate choice-set dependent preferences. Results support our conjectures for cycles beyond simply noise or error in implementing latent transitive preferences. This is a remarkable finding given the strong consensus in economics and psychology behind the transitivity axiom that exists today. We saw that very little solid evidence for preference cycles has been shown to exist until now. This 'absence of evidence' has implicitly been viewed as 'evidence of the absence' of preference cycles such that future theory development can safely invoke the transitivity axiom in descriptive models. The STP, the bespoke lotteries it inspired, the results of the experiment reported here and our response to the money pump critique together demonstrate the flaws in this assumption. weakening transitivity to apply only in a choice-set specific manner, the axiom is essentially stripped of its normative force, its descriptive and predictive power and its ability to establish the existence of an expected utility representation in combination with other axioms. There would seem to be little to be gained by adopting such a defence in our experiment; however Muller-Trede et al (2015) make a plausible case for when the attributes and magnitude ranges are much less familiar than units of money.