New subject: Proposal xxx: Filtering malicious rendezvous points at hidden service server side

23 Jan 2016

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Hello,
Inspired by asn, dgoulet and Mike Perry discussing alternate path
lengths for proposal 247, I wrote a proposal that could fit in nicely
with proposal 247, maybe even allow us to make the 3rd hop wild
(random) so we don't have to worry about HSDirs + HSDir fetch/post
circuits and introduction points + introduction points circuits
linking short term layer 3 guards.
Please find the proposal attached as well as inline for comments.
Looking forward for reviews.
If I've structured it wrong or something I apologize in advance - will
do better next time.
Filename: xxx-malicious-rendezvous-point-filter.txt
Title: Filtering malicious rendezvous points at hidden service server side
Authors: s7r [ 0x837FA52C81265B11 ]
Created: 23 January 2016
Status: Draft
0. Definitions
client = an user running Tor as a client, which connects to a hidden
service.
hidden service server = the Tor instance hosting the hidden service.
rendezvous point = the relay which acts as the rendezvous meeting
point in a circuit between a hidden service client and a hidden
service server.
1. Motivation
A hidden service circuit (rendezvous circuit) is always initiated by
the client so the relay which should act as the rendezvous meeting
point is selected by the client. We assume that any client can be an
attacker and will try to make the hidden service server connect to a
malicious rendezvous point under his control.

The attacker is also a Sybil (holds an unknown % of the bandwidth in
the Tor network). By making the hidden service server build many
circuits to his evil rendezvous points, the attacker gets a high
probability that the hidden service server will eventually pick his
evil relays in a circuit, so the attacker will trivially perform a
successful hidden service guard discovery attack or, with more luck,
discover the real location of the hidden service server.

The hidden service server can only defend itself by building a 3 hop
circuit to the rendezvous point, but in practice this is not always
enough.
2. Logic for filtering malicious rendezvous points
To be able to make fair assumptions about which rendezvous point might
be malicious or not we are going to use the probability theory and
statistics, precisely the binominal distribution equation [0].

In simple words, we count and keep track of how many rendezvous
circuits a hidden service server built and to which rendezvous points.
Then, based on the weight (middle probability fraction) of each
rendezvous point, we determine if one was insanely overpicked by
clients. Think of this like: Alice plays a game with Bob in which Bob
has the success probability to draw a certain number n%. In practice
however, he shows a success rate of more than (n*2)% (2 times bigger),
Alice assumes Bob is not playing nice and will refuse to play with him
for the next 24 hours or so.

Fast example:
A hidden service server built 100 rendezvous circuits in the last 24
hours, out of which in 7 of them the rendezvous point was a relay with
the middle probability fraction of 0.5%. We assume that rendezvous
point is malicious and refuse to build rendezvous circuits to it for
the next 24 hours.

3. Equations and formulas
3.1 Probability of different clients genuinely picking the same
rendezvous point
n = total number of established rendezvous circuits.
k = number of clients that could have genuinely picked the same
rendezvous point.
p = probability of picking that rendezvous point per client (middle
probability fraction of the relay) divided to 100.
H = Probability mass function expanded.
C = The result of the equation, the value we need to determine
multiplied with 100 to have the percent value.

Formulas:
H = n! / k!^(n - k)!
C = H * p^k * (1 - p)^n - k

Example:
n = 100
k = 9
p = 0.5% / 100 = 0.005

H = 100! / 9!^91! = 20023492720
C = 20023492720 * 0.000000000000000000001953125 * 0.63372, ~
0.00000000002478376524710625 * 100, ~ 0.000000002478376524710625%

3.2 Expected number of clients to pick the same rendezvous point
n = Total number of established rendezvous circuits.
p = Probability of picking that rendezvous point per client (middle
probability fraction of the relay) divided to 100. We multiply this
with 2 intentionally so we will allow 2 times more circuits than we
would expect based on this formula - this should reduce the error
margin enough.
C = The result of the equation, the value we need to determine.

Formula:
C = n * p

Example:
n = 100
p = 0.5% / 100 = 0.005 * 2 = 0.01
C = 100 * 0.01 = 1

4. Implementation and variables
We keep track in the persistent state of a hidden service server of
the total number of built rendezvous circuits as well as a counter for
number of rendezvous circuits per rendezvous point.

Variables:
REND_FILTER_MONITOR_PERIOD = count and track the rendezvous circuits
built within this latest period. We set it to 24 hours.
REND_FILTER_BAN_PERIOD = for how long we are going to ban a malicious
rendezvous point. We set it to 24 hours.

REND_FILTER_TOLERANCE = how many circuits we allow per rendezvous
point regardless of its middle probability fraction. We set it to 4.

REND_FILTER_BAN_ACTIVATION = when we should ban a malicious rendezvous
point. To determine this, we are going to get from
REND_FILTER_MONITOR_PERIOD the total number of established rendezvous
circuits as well as the number of established rendezvous circuits with
a given rendezvous point and apply the formula from 3.2 with
approximation to the next whole number (example: 5.01 = 6; 5.99 = 6).

Example: A hidden service server built 100 circuits in the last 24
hours out of which 5 to a rendezvous point with the middle probability
fraction 1% (p = 0.02), so the expected number of circuits we expect
is 2, but allow up to maximum 4 (if REND_FILTER_TOLERANCE >
REND_FILTER_BAN_ACTIVATION, REND_FILTER_TOLERANCE prevails). In this
case we consider this too much of a coincidence and ban this
rendezvous point for the next 24 hours. We keep this ban in our
persistent state with a timestamp so we'll know when it expires.

5. Ban malicious rendezvous points just for rendezvous circuits, not
for anything else
If we treat as relay as a malicious rendezvous point for a given time
period, this only means we are not building circuits with it as the
rendezvous point, but we don't exclude it for other purpose circuits,
regardless which are those other purpose circuits. We may use them in
our exit circuits, consensus fetch circuits, as HSDirs and HSDir fetch
/ post circuits, as Introduction Points or Introduction Point circuits.
6. Non zero probability to assign the malicious rendezvous point flag
to an innocent relay
This is indeed a non-zero value and it can be computed with the
formula from 3.1, but it's a value so close to zero that we think it's
totally worth it.

Even if accidentally (low chances) an innocent relay will be banned,
this will be something local to the hidden service server. It won't
affect that relay at all, nor how other client or hidden service
servers treat that relay. It has nothing to do with the network wide
consensus as well.

A honest client will always retry with a different rendezvous point,
so honest clients should not experience reachability issues.
7. Rendezvous points that aren't included in hidden service server's
consensus
Clients and hidden service servers might have different consensuses.
This means that a hidden service server can be genuinely asked to
build a rendezvous circuit to a rendezvous point for which it has no
middle probability fraction because it's not included in that
consensus. We should allow REND_FILTER_TOLERANCE number of circuits to
the rendezvous points for which we have no weights at all and if that
rendezvous point appears in a later consensus within the
REND_FILTER_MONITOR_PERIOD update its middle probability fraction
value to ensure the filter will activate and work properly once
REND_FILTER_TOLERANCE threshold was reached.

8. Security implications for maintaining such records in the
persistent state of a hidden service server
An attacker which compromises the location of a hidden service server
will access this additional information: total number of rendezvous
circuits built within the last REND_FILTER_MONITOR_PERIOD and to which
rendezvous points these circuits were. This tradeoff should be worth
it as well, since if the location of a hidden service server is
compromised it is game over anyway, and this additional information
shouldn't be so important to the attacker, except maybe for better
determining the popularity of that hidden service (this can be
determined many other ways, like the logs from the application running
behind the hidden service, network counter, datacenter/ISP level
counters and logs, etc.).
To avoid co-location linkability between different hidden services, we
should keep rendezvous circuits records and counters individually for
each hidden service.

The logic is designed to adjust by itself dynamically, based on the
popularity and usage of a hidden service. If it's a high profile
hidden service, experiencing 10000 connections per 24 hours, the
REND_FILTER_BAN_ACTIVATION will increase for every rendezvous point.

9. Default option, with possibility to turn off for testing
environments and experiments
We add a new torrc option named HiddenServiceRendFilter 0|1 and set it
to 1 by default. It can be turned off by setting it to 0, if this will
be desired in some testing environments or experiments.
10. Other considerations, limitations and compatibility with other
existing proposals

This proposal won't disable an attacker entirely, but it will make it
much harder and more expensive, given that the attacker will have to
run more malicious relays to act as rendezvous points, increasing the
capacity of the Tor network overall and directly the costs of the
attack. Also, the attacker will have to grow the popularity of a
hidden service so that he can establish more rendezvous circuits using
the same rendezvous points. This will at least be kind of hard if
other genuine clients pick the same rendezvous points and get noisy to
the hidden service operator (much more traffic, same number or
slightly more users on his service).

This proposal is compatible with all the other hidden service related
proposals to date. For tools like OnionBalance, each backend /
fallback Tor instance included in the master hidden service descriptor
will just transparently and individually maintain their own tracking
vales and ban list.

With proposal 260 - Rendezvous Single Onion Services this should be
disabled since it makes no sense. Set HiddenServiceRendFilter 0 in a
RSOS type hidden service.

This proposal should be quite helpful once we implement enough padding
between relays, where an attacker might have to own the entire path in
order to distinguish dummy traffic from real traffic.

This proposal should fit in nicely with proposal 247 - Vanguards and
could make it safe to leave the 3rd hop wild (no layer 3 short term
guards) and use only layer 2 medium term guards + layer 1 long term
guard since an attacker will be able to establish less circuits to his
malicious rendezvous point, thus drastically decreasing the chances of
the hidden service server to also pick his malicious relay as 3rd hop
to the malicious rendezvous point (layer 2 guards discovery).
[0]: https://en.wikipedia.org/wiki/Binomial_distribution
...PGP SIGNATURE...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)

iQEcBAEBCAAGBQJWo/K4AAoJEIN/pSyBJlsRAnMIAKFDDFm0DICrog1JecmrO82y
N7dJf6tDURjGkUTr5i+mf91S4gTvxYigkT/tWTPH2E65XQGl2qLXo4d1+vgfaJmf
H7QxUFy2kt+oFV03pg7SsF1KlMLaIVjlPMC17hPjAL8wnLMPkTICUn+VqWVYqmq+
w4uFPncm42mhkd3EwRq3HCrwq4O7PasWfSBYXmAuIJl6CWDkZqBI+UUj56dqQG6c
zpWUMIZpfaMQzEJ438VhpVpiN19Ep5hIV5dJfz4MJ9eyhyas487nRXYyirEWqBs0
VXUsb/m/wgHICLjLukNYJ/+NZUk+Y5mIhL756HIiYUlL8UK85MaJhbozvjhqLnw=
=tMjA
-----END PGP SIGNATURE-----