Constant Function Market Makers (CFMMs) have emerged as an innovation in decentralized finance, providing an efficient and computationally cheap on-chain solution for exchanging digital assets without needing traditional order books. CFMMs revolutionize how assets are traded, enabling a more seamless and accessible marketplace and have become the most widely used decentralized crypto product. However, as the popularity of decentralized finance grows, so does the need to address the privacy concerns arising from the transparent nature of blockchain-based transactions. In this article, we will delve into the nature of privacy in CFMMs and discuss them in reference to the works “A note on privacy in Constant Function Market Makers” by Angeris et al. and “Differential Privacy in Constant Function Market Makers” by Tarun Chitra et al.
Why Privacy? CFMMs offer LPs looking for a passive yield to offer their assets to willing traders. This works in CFMMs by ensuring that the “trading function” is kept constant, i.e. the price is determined using this function. Privacy is primarily desired on three ends so:
LPs don’t get adversely selected against
Users have all their trades publicly visible to everyone and directly being associated with their identity.
The public nature of transactions, of course, also allows third parties to front-run a user’s trades.
For example, In constant sum market makers, the quote is always a single price at all times so an adversary cannot discern if the trade is of size 50 or 50 trades of size 1, but the LPs get adversely selected against because anyone can buy large quantities while this price is decreasing in other places. Designing better CFMMs involves quantifying and qualifying the tradeoffs between the ideal privacy we desire from our CFMMs and the adverse selection that an LP can potentially face.
Although smart contracts that utilize Zero Knowledge proof systems should be able to execute CFMM transactions privately. It has been shown in recent work, “Why cant you build a private uniswap with ZKPs”, that informally and “heuristically”, the timing of a trade implicitly leaks privacy in Uniswap and other CFMMs. In simple words, the timing of the trade can help you reconstruct the values.
This paper shows that:
Even under relatively weak adversaries, CFMMs are generally unable to preserve privacy
Provide some mitigation mechanisms and start a discussion around them to help improve user privacy
Constant Function Market Makers: A CFMM is an automated market maker defined by its reserve quantities and trading function. The reserve quantities R and a trading function ψ. Ultimately, the intuitive notion of a CFMM is very simple; an agent/user proposes a trade Δ which is a vector signifying trade quantities. While in the original CFMM paper, there are two different tuples/vectors for inputs and outputs. In this case, we denote Δ_i as a positive quantity if it was given to the CFMM, and Δ_i is negative if it is taken. The CFMM checks if the trade satisfies the trading function, and the reserves are then updated if this is deemed valid. If you aren’t entirely familiar with CFMMs, you can take a look at this.
We assume that the trading function ψ is strictly concave, which it is for most AMMs of interest except for constant sum market makers and other special cases. The marginal price for a fee-less CFMM is given by
where λ >0.
Assume that Eve is the adversary trying to discover the quantity traded by Alice. Eve is unable to see the exact quantities traded by Alice but knows when Alice’s transaction took place.
Eve is able to query the marginal price of the CFMM at the current reserves and whether a given trade Δ is valid. Eve can also query the CFMM before and after the transaction. This adversary attack fails if the transaction time is obfuscated, but such protocols don’t exist today.
Before we talk about the sequence of the attack, it is important to understand that Eve can always reconstruct the reserve amounts given the marginal price at the current system reserves and a single nonzero feasible trade. Once this is obtained, it is enough to compute the reserve amounts before and after Alice’s trade to recover the traded amounts.
The sequence of the attack is as follows:
Eve queries the marginal price of CFMM at current reserves to get some vector c and then queries any valid nonzero trade Δ is not equal to 0.
Knowing the functional form of the trading function, ψ, Eve solves the following and finds a solution to find R (the reserves)
Let Δa be Alice’s trade, and the new reserves be R_a = R + Δa (This is not known to Eve, but the CFMM can now be queried in this new state)
After the trade Δa, Eve queries the contract again to get a new marginal price c’ and then queries another nonzero trade Δ’. She can solve the following system of equations to obtain the reserves R_a.
Reserve Discovery in Uniswap
Ultimately following this proof and plugging in the respective equations for uniswap, we find that only by having the marginal price c and a nonzero trade Δ, we can recover the uniswap reserves. While this is a special case, it is possible to reconstruct reserves using any two nonzero, distinct feasible trades.
Future extensions of the proof
where γ is the fees.
Now that we have understood that providing privacy directly in CFMMs is impossible as long as the timing of the trade is emitted, i.e. known, We’ll talk about a few ways in which we can think about the modifications we can do to provide some modicum of privacy.
Randomness in price - Eve requires knowledge of c in order to reconstruct the reserves, i.e. knowledge of the marginal price. Adding some amount of randomness can help prevent Eve from reconstructing the reserve values. Controlling randomness, in this case, is very difficult because it needs to be consistent, and any pattern can probably be deciphered by querying multiple trades. And generally, as we know that if the difference, i.e. randomness, is large, it will quickly get exploited by arbitrageurs resulting in additional loss of liquidity for the LPs. In simple words, randomness in price forces worse prices on the end users.
Batching Orders - The CFMM waits till several trades Δ are accepted and updates its reserves only after all trades are executed. Batching orders has a tradeoff against the performance of the system and has an immediate degrading effect on the user experience. The CFMM also has to ensure that all the trades are not by Eve (except Alice’s) because, in that case, it would reveal the reserves before and after the batch of trades, hence giving a much simpler way of getting the values Alice traded. In simple words, batched orders add latency to the end users.
Now that we’ve established what we expect from CFMMs when we talk about privacy, the issues we face and some of the mitigation strategies. At a high level, privacy in CFMMs is as simple as preventing anyone from figuring out the trade prices of a user. A natural question to ask is, what kind of tradeoff is possible between privacy and pricing? What kind of guarantees can we give when we design a CFMM with privacy-first attributes? And what are ways we can achieve this?
In the paper “Differential Privacy in CFMMs”, Tarun Chitra et al. the authors sought to answer two major questions:
What is the minimum number of samples n(δ) such that an adversary is unable to infer the actual sizes of the trades beyond a precision δ? i.e. in other words, what should be the size of a batch?
How much worse is the worst price offered to a user via such a mechanism? i.e. in other words, in a batch, what’s the worst trade execution price looking like?
The paper establishes the following:
Quantifies the trade-off between pricing and privacy in CFMMs
Analyze a simple privacy mechanism called Uniform Random Execution and prove that it provides (ϵ,δ) differential privacy, where ϵ depends on the curvature of the CFMM/the number of trades executed.
Conclude that when it comes to CFMMs with non-zero curvature, one cannot perform better than Uniform random execution when it comes to providing privacy.
Before we look at various aspects of the paper, we’ll take a look at some definitions again:
CFMM State: The reserves of a CFMM are updated using the following formula where Δ is transferred from the trader to the dex, and Λ is transferred from the DEX to the trader. The pair of (Δ, Λ) make up the “proposed trade”, which is initiated by a user. Δ signifies the tender basket, and Λ signifies the received basket.
Trading Function: As we saw earlier in this article, a proposed trade (Δ, Λ) is only accepted when
in the earlier section of the article, we only saw an R + Δ instead of denoting the tender and received basket separately. This is also because here, we consider fees explicitly. Also, the term γ (<1) signifies the trading fee. The term “constant function” for a CFMM comes from the above condition), that regardless of a trade being executed, the function remains constant when executed/examined.
Something to note about the trading function again, as we did earlier, is we assume ψ is strictly concave. Quickly taking as an example,
The trading function for Uniswap generally is
but this is neither convex nor concave, but it can be equivalently written as
which is strictly concave in nature whenever R>0.
Curvature: The marginal price for a trade size of Δ for a CFMM is:
Here δ_1 and δ_2 are the partial derivatives wrt to the ith argument. g is known as the price impact function, as it represents the final price (marginal price) of a trade.
There are two CFMM properties that we care about, μ-stable and κ-liquid. They are satisfied if
Basically, μ-stable means whenever a non-negative trade size of Δ does not change the market price by more than μΔ, it places a linear upper bound on the maximum price impact that a bounded trade can have. Similarly, κ-liquid means that there is some price slippage when a token is sold but is linearly bounded from below by at least κΔ.
A small digression to understand where differential privacy comes from and what the entire progression towards it was like:
The progression after that
K-anonymity: To address the limitations of data anonymization, k-anonymity was proposed. In a k-anonymized dataset, each record is indistinguishable from at least k-1 other records with respect to certain attributes. However, k-anonymity has its limitations, as it does not protect against attacks based on the distribution of sensitive attributes.
L-diversity: L-diversity extends k-anonymity by ensuring that each group of records sharing the same attributes also contains at least "l" distinct values for the sensitive attributes. This provides stronger privacy guarantees but still has limitations in protecting against attacks that exploit background knowledge.
T-closeness: To further strengthen privacy, t-closeness was introduced, which requires the distribution of sensitive attributes in each group of records to be close to the overall distribution of those attributes in the entire dataset. This makes it more difficult to infer an individual's sensitive attributes based on their non-sensitive attributes.
Differential privacy: Differential privacy emerged as a more robust privacy framework that mathematically guarantees an individual's privacy. It achieves this by adding noise to data queries, ensuring that the presence or absence of any individual data point does not significantly change the outcome. This makes it extremely difficult for adversaries to infer sensitive information about individuals from the data, even with background knowledge.
Let’s understand differential privacy with the help of an example. Let’s say you have a group of friends who want to know the average age of the group, but no one wants to reveal their exact age. To solve this problem while maintaining privacy, you can use a simple form of differential privacy.
Each person adds random noise (e.g., a random number between -5 and 5) to their age. This altered age is called the "noisy age."
Everyone shares their noisy age with the group.
The group calculates the average of the noisy ages.
To get the actual average age, the group subtracts the average noise that was added in Step 1.
Now, the group has calculated the average age without revealing any individual's exact age. Even if someone tries to guess a person's age from their noisy age, it would be difficult because of the random noise added to it.
Ultimately privacy frameworks like this sort of tried to anonymize information which helped obscure data which could personally identify someone but the aggregate statistics of the dataset remained somewhat similar.
One way of guaranteeing differential privacy is ε-differential privacy (which was actually introduced in the original paper. More methods have come out since then, but most are relaxations of the ε condition. Let ε be a positive real number and A be a randomized algorithm that takes a dataset as input (representing the actions of the trusted party holding the data). The algorithm A is said to provide ε -differential privacy if, for all datasets D_1 and D_2 that differ on a single element (i.e., the data of one person), the following condition holds:
where
Pr[A(D_1) ∈ O] is the probability that algorithm A produces an output in the set S when applied to dataset D_1.
Pr[A(D_2) ∈ O] is the probability that algorithm A produces an output in set S when applied to dataset D_2.
ε (epsilon) is a non-negative value representing the privacy loss parameter. Smaller ε values correspond to stronger privacy guarantees.
In other words, ε-differential privacy ensures that the presence or absence of any individual's data does not significantly change the probability distribution of the algorithm's output.
(ϵ, δ) differential privacy is a relaxation of the standard definition of differential privacy, known as ε-differential privacy. The (ϵ, δ) differential privacy provides a more flexible privacy guarantee by introducing an additional parameter δ. It allows for a small probability of a slightly larger privacy breach. A randomized algorithm A is said to provide (ϵ, δ) -differential privacy if, for all datasets D_1 and D_2 that differ on a single element (i.e., the data of one person), the following condition holds:
Where the terms are the same as the above with just the addition of a larger deviation term δ.
As we saw earlier towards the end of discussing the first paper, there are two ways to mitigate the attack model by eve:
Randomizing Price
Batching orders
We now discuss the rest of the work in “Differential privacy in CFMMs” by Tarun Chitra et al., The paper presents a threat model (which is the eve model from the earlier paper) and then constructs two solutions and talks about various aspects of the solutions and the bounds that we might see.
The threat model: Eve attempts to discover the quantities that Alice has traded. In this model, we go further and assume that Eve wants to discover all the quantities by a set of agents. Eve is unable to see the quantities but knows when the traders’ transactions Δ_1, Δ_2, …. Δ_n are executed. Eve does not know the order. The goal is to estimate the order and sizes. Eve’s only ability is to interact with the CFMM to see if a trade is valid and compute the marginal spot (which is done through accessing the state of the CFMM before and after the transactions of some trader in our earlier case Alice).
One of the simple ways to introduce randomness into a CFMM is to randomly permute the set of trades to be executed.
Suppose we are given a vector of valid trades,
For a trade vector Δ, the above condition will be referred to as A_φ(Δ).
The sure mechanism draws a random permutation
and constructs a sequence of trades:
Now consider the original trades marginal prices p_1, p_2, p_3 … p_n and the permuted price p^π_1, p^π_2… p^π_n, we would want to bound the maximum deviation between the true price of a trade p and the permuted price p^π. What does this mean? Let’s just say the original trade prices were [10,20,30,40 ….] and the permuted order trade execution yields the prices [40, 30, 10, 20….]. When we consider the trades in the first place, the trade price is now 4x worse for the user, so when constructing any such ordering, it is paramount to bound this maximum deviation. This maximum deviation is given as follows:
Mentioning again, this deviation, i.e. quantity, corresponds to the bound on the worst quoted price that a trader would get. We also want to capture and discuss the difficulty that the adversary would face learning/figuring out the values of π chosen given only the prices p^π_n.
Let’s consider two scenarios where in we can talk about the deviation impact.
One, if all the trade sizes are unique, then beyond some precision, it is very difficult to compute the original trade order since every permutation would be unique. What could the worst price impact look like? In this case, SURE would work pretty well, and the maximum deviation would be bounded since the probability of having a long list of trades in the same direction is very low (i.e. only buy or only sell).
Two, now consider that the subset of trades is similar, i.e. actually the same value. For example that in n trades, the first trade is of the size 100, and all the other trades are of the size 1. When doing this in the original order, everyone gets the adjusted price after the big trade, while in any other permutation, every trade that gets executed before the large trade of size 100 gets a better price, and every one after it gets a worse price than before. Therefore, in this case, SURE requires that the trade distribution should have sufficient entropy and the distribution of trades shouldn’t be too concentrated (as we saw, a large trade makes the randomization give a worse price to everyone after and a better one to everyone before). We discussed the two properties that we care about when it came to CFMMs earlier.
Note: Skip to Uniform Random Execution and just read the paragraph before it if you don’t want to delve into the mathematical treatment of finding the bounds
We will try to bound the maximum of the price process through the use of these and random binary trees. We analyse SURE on a subset of the total set of allowable input trades. Suppose the price impact function g is μ-stable and κ-liquid on an interval [-M, M]. This implies
which then bounds the partial sums of permuted trades by
We define the partial sum as
and construct a binary search tree T, whose root is R_1, that is just the partial sum of the first element, then R_2 is the partial sum of two elements, R_3 is the partial sum of 4 elements etc. An example of a binary tree can be seen below:
This representation of partial sums as a tree provides a natural geometric description of the maximum price deviation. The max_i R_i(Δ,π) is a lead node in this tree. By using curvature and tree structure, the problem of maximum price deviation (which is a continuous problem) is turned into a combinational one (regarding a random search tree). This modifies our earlier bounds to
We know from the paper “The Height of a random binary search tree” by Reed et al. 2003, that if a random binary search tree has unique elements, then the expected average height is given by the following, where α ≈ 4.31107 is the unique solution on the interval [2,∞) of the equation α * ln ((2e/α)) = 1 and β = 32 ln (α/2) and ln means natural log. The variance of the height of the tree is constant.
Plugging this in the bounds we found earlier, we find that,
and
This implies that an adversary cannot determine a trade size with precision greater than Ω(κ) since there is always a minimum price discrepancy of Ω(κlogn). On the other hand, the upper bound on the price deviation means that the mechanism will not cause too great a price impact for users. There are some assumptions here that we now note. We assume that R_j (Δ, π) is equiprobable, but this doesn’t hold true when one of the trades is much larger than all other trades. And the average height is only used when all the elements of the binary tree are unique. This means that for SURE to work:
All permutations of partial sums should be unique.
To solve this, The authors propose Uniform random execution by adding two things to the SURE algorithm
By adding noise independent of Δ,μ,κ
Splitting trades
The Uniform Random Execution mechanism is governed by three parameters
c_min: Lower bound on Δ_min i.e the minimum price deviation
s: Split threshold that controls the average chunk size for a big trade
k: Multiple of (1+S) Δ_min that requires splitting
To guarantee the following,
We add Laplace noise. We do this by constructing random variables *ξ_*1,...ξ_n drawn i.i.d (independent and identically distributed) from a distribution that can depend on a particular Δ but guarantees that Δ = Δ + ξ.
The upper bound, as seen above, can be reduced by splitting trades. This reduces the max and also increases the privacy of SURE. Instead of splitting trades in two, we split trades into m(Δ_i) pieces, where m(Δ_i) is defined as
URE satisfies Differential privacy: Two claims are proven in the paper:
Splitting is differentially private
SURE is differentially private, i.e. supposing we have a sequence of admissible trades Δ such that the height of the random search tree is O(log n) and all trade sizes are unique, then randomly permuting the trades can be made into a (μ logn,δ) - differentially private algorithm.
Ultimately, this helps us show that URE is (ϵ, δ) differentially private with
These results suggest that permuting and splitting up trades is a simple and viable mechanism for adding differential privacy to CFMMs.
One thing to note is we can achieve this differential privacy on any chain today, which has a verifiable random function because this can help us create the noise to add and also help us randomly permute the orderings of the trades. This is because, as we saw in our example of differential privacy originally, we need a randomness source to obscure the data, and doing this in a trustless way is only possible by Verifiable random functions or VDFs (Verifiable Delay Functions) that serve as randomness beacons.
Deriving from the works of “Differential privacy in CFMMs” and “Improved Price oracles: CFMMs”, the following things can be observed:
In the Differential privacy paper, the authors show the trade-off between the cost of privacy and the level of privacy. When a trade is split into two smaller trades combined with shuffling the order of transactions, this leads to better privacy for users while worsening their price impact.
In the CFMM paper, It is demonstrated that splitting trades and executing them on a CFMM leads to worse utility, i.e. worse price execution (the user pays a higher price for the same quantity of an asset.
These results let us conclude that CFMMs trade off privacy for utility.