Hi,
tldr:
- more outdated relays
(that is a claim I'm making and you could
easily proof me wrong by recreating the 0.3.3.x alpha
repos and ship 0.3.3.7 in them and see how things evolve
after a week or so)
- more work for the tpo website maintainer
- less happy relay operators [3][4]
- more work for repo maintainers? (since a new repo needs to be created)
When the tor 0.3.4 alpha repos (deb.torproject.org) first appeared on 2018-05-23
I was about to submit a PR for the website to include it in the sources.list
generator [1] on tpo but didn't do it because I wanted to wait for a previous PR to be merged first.
The outstanding PR got merged eventually (2018-06-28) but I still did not submit a PR to
update the repo generator for 0.3.4.x nonetheless and here is why.
Recently I was wondering why are there so many relays running tor version 0.3.3.5-rc? (see OrNetStats or Relay Search)
(> 3.2% CW fraction)
Then I realized that this was the last version the tor-experimental-0.3.3.x-*
repos were shipping before they got abandoned due to the new 0.3.4.x-* repos
(I can no longer verify it since they got removed by now).
Peter made it clear in the past that the current way to
have per-major-version debian alpha repos (i.e. tor-experimental-0.3.4.x-jessie)
will not change [2]:
> If you can't be bothered to change your sources.list once or twice a
> year, then you probably should be running stable.
but maybe someone else would be willing to invoke a
"ln" commands everytime a new new alpha repo is born.
tor-alpha-jessie -> tor-experimental-0.3.4.x-jessie
once 0.3.5.x repos are created the link would point to
tor-alpha-jessie -> tor-experimental-0.3.5.x-jessie
It is my opinion that this will help reduce the amount of relays running
outdated versions of tor.
It will certainly avoid having to update the tpo website, which isn't a big task
and could probably be automated but it isn't done currently.
"..but that would cause relay operators to jump from i.e. 0.3.3.x to 0.3.4.x alphas
(and break setups)!"
Yes, and I think that is better than relays stuck on an older version because
the former repo no longer exists and operators still can choose the old repos
which will not jump to newer major versions.
[1] https://www.torproject.org/docs/debian.html.en#ubuntu
[2] https://trac.torproject.org/projects/tor/ticket/14997#comment:3
[3] https://lists.torproject.org/pipermail/tor-relays/2018-June/015549.html
[4] https://trac.torproject.org/projects/tor/ticket/26474
--
https://twitter.com/nusenu_https://mastodon.social/@nusenu
Hi,
every now and then I'm in contact with relay operators
about the "health" of their relays.
Following these 1:1 discussions and the discussion on tor-relays@
I'd like to rise two issues with you (the developers) with the goal
to help improve relay operations and end user experience in the long term:
1) DNS (exits only)
2) tor relay health data
1) DNS
------
Current situation:
Arthur Edelstein provides public measurements to tor exit relay operators via
his page at: https://arthuredelstein.net/exits/
This page is updated once daily.
the process to use that data looks like this:
- first they watch Arthur's measurement results
- if their failure rate is non-zero they try to tweak/improve/change their setup
- wait for another 24 hours (next measurement)
This is a somewhat suboptimal and slow feedback loop and is probably also
less accurate and less valuable data when compared to the data the tor
process can provide.
Suggestion for improvement:
Exposes the following DNS status information
via tor's controlport to help debug and detect DNS issues on exit relays:
(total numbers since startup)
- amount of DNS queries send to the resolver
- amount of DNS queries send to the resolver due to a RESOLVE request
- DNS queries send to resolver due to a reverse RESOLVE request
- amount of queries that did not result in any answer from the resolver
- breakdown of number of responses by response code (RCODE)
https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-pa…
- max amount of DNS queries send per curcuit
If this causes a significant performance impact this feature should be disabled
by default.
2) general relay health metrics
--------------------------------
Compared to other server daemons (webserver, DNS server, ..)
tor provides little data for operators to detect operational issues
and anomalies.
I'd suggest to provide the following stats via the control port:
(most of them are already written to logfiles by default but not accessible
via the controlport as far as I've seen)
- total amount of memory used by the tor process
- amount of currently open circuits
- circuit handshake stats (TAP / NTor)
DoS mitigation stats
- amount of circuits killed with too many cells
- amount of circuits rejected
- marked addresses
- amount of connections closed
- amount of single hop clients refused
- amount of closed/failed circuits broken down by their reason value
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n1402https://gitweb.torproject.org/torspec.git/tree/control-spec.txt#n1994
- amount of closed/failed OR connections broken down by their reason value
https://gitweb.torproject.org/torspec.git/tree/control-spec.txt#n2205
If this causes a significant performance impact this feature should be disabled
by default.
cell stats
- extra info cell stats
as defined in:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n1072
This data should be useful to answer the following questions:
- High level questions: Is the tor relay healthy?
- is it hitting any resource limits?
- is the tor process under unusual load?
- why is tor using more memory?
- is it slower than usual at handling circuits?
- can the DNS resolver handle the amount of DNS queries tor is sending it?
This data could help prevent errors from occurring or provide
additional data when trying to narrow down issues.
When it comes to the question:
**Is it "safe" to make this data accessible via the controlport?**
I assume it is safe for all information that current versions of
tor writes to logfiles or even publishes as part of its extra info descriptor.
Should tor provide this or similar data
I'm planing to write scripts for operators to make use
of that data (for example a munin plugin that connects to tor's controlport).
I'm happy to help write updates for control-spec should these features
seem reasonable to you.
Looking forward to hearing your feedback.
nusenu
--
https://twitter.com/nusenu_https://mastodon.social/@nusenu
The unbearable situation with Google's reCAPTCHA
motivated this email (but it is not limited to this
specific case).
This idea came up when seeing a similar functionality
in unbound (which has it for a different reason).
Assumption: There are systems that block some tor exit
IP addresses (most likely the bigger once), but they
are not blocked due to the fact that they are tor exits.
It just occurred that the IP got flagged
because of "automated / malicious" requests and IP reputation
systems.
What if every circuit had its "own" IP
address at the exit relay to avoid causing collateral damage
to all users of the exit if one was bad? (until the exit runs out of IPs and
starts to recycle previously used IPs again)
The goal is to avoid accumulating a bad "reputation" for the
single used exit IP address that affects all tor users
of that exit.
Instead of doing it on the circuit level you could do it
based on time. Change the exit IP every 5 minutes (but
do _not_ change the exit IPs for _existing_ circuits even if they
live longer than 5 minutes).
Yes, no one has that many IPv4 addresses but with the
increasing availability of IPv6 at exits and destinations,
this could be feasible to a certain extend, depending on
how many IPv6 addresses the exit operator has.
There are exit operators that have entire /48 IPv6 blocks.
problems:
- will not solve anything since reputation will shift to netblocks as well
(How big of a netblock are you willing to block?)
- you can tell two tor users easily apart from each other
even if they use the same exit (or more generally: you can
tell circuits apart). There might be all kinds of bad implications
that I'm not thinking off right now.
- check.tpo would no longer be feasible
- how can do we still provide the list of exit IPs for easy blocking?
Exits could signal their used netblock via their descriptor. What if they don't?
(that in turn opens new kinds of attacks where an exit claims to be /0
and the target effectively blocks everything)
- more state to track and store at the exit
-...
some random thoughts,
nusenu
--
https://mastodon.social/@nusenu
twitter: @nusenu_
Hi!
I'm sending a new version of proposal 295 from Tomer Ashur, Orr
Dunkelman, and Atul Luykx. It's an updated version of their design
for an improved relay cell encryption scheme, to prevent tagging
attacks.
This proposal is checked into the torspec repository. I'm also
linking to a diagram for this scheme (and its latex source) from Atul
Luykx: https://people.torproject.org/~nickm/prop295/
Finally, I have a draft python reference implementation for an older
version of this proposal. I hope to be updating it soon and sending
out a link next week.
cheers! -- Nick
Filename: 295-relay-crypto-with-adl.txt
Title: Using ADL for relay cryptography (solving the crypto-tagging attack)
Author: Tomer Ashur, Orr Dunkelman, Atul Luykx
Created: 22 Feb 2018
Last-Modified: 1 March 2019
Status: Open
0. Context
Although Crypto Tagging Attacks were identified already in the
original Tor design, it was not before the rise of the
Procyonidae in 2012 that their severity was fully realized. In
Proposal 202 (Two improved relay encryption protocols for Tor
cells) Nick Mathewson discussed two approaches to stymie tagging
attacks and generally improve Tor's cryptography. In Proposal 261
(AEZ for relay cryptography) Mathewson puts forward a concrete
approach which uses the tweakable wide-block cipher AEZ.
This proposal suggests an alternative approach to Proposal 261
using the notion of Release (of) Unverified Plaintext (RUP)
security. It describes an improved algorithm for circuit
encryption based on CTR-mode which is already used in Tor, and an
additional component for hashing.
Incidentally, and similar to Proposal 261, this proposal employs
the ENCODE-then-ENCIPHER approach thus it improves Tor's E2E
integrity by using (sufficient) redundancy.
For more information about the scheme and a security proof for
its RUP-security see
Tomer Ashur, Orr Dunkelman, Atul Luykx: Boosting
Authenticated Encryption Robustness with Minimal
Modifications. CRYPTO (3) 2017: 3-33
available online at https://eprint.iacr.org/2017/239 .
For authentication between the OP and the edge node we use
the PIV scheme: https://eprint.iacr.org/2013/835
2. Preliminaries
2.1 Motivation
For motivation, see proposal 202.
2.2. Notation
Symbol Meaning
------ -------
M Plaintext
C_I Ciphertext
CTR Counter Mode
N_I A de/encryption nonce (to be used in CTR-mode)
T_I A tweak (to be used to de/encrypt the nonce)
T'_I A running digest
^ XOR
|| Concatenation
(This is more readable than a single | but must be adapted
before integrating the proposal into tor-spec.txt)
2.3. Security parameters
HASH_LEN -- The length of the hash function's output, in bytes.
PAYLOAD_LEN -- The longest allowable cell payload, in bytes. (509)
DIG_KEY_LEN -- The key length used to digest messages (e.g.,
using GHASH). Since GHASH is only defined for 128-bit keys, we
recommend DIG_KEY_LEN = 128.
ENC_KEY_LEN -- The key length used for encryption (e.g., AES). We
recommend ENC_KEY_LEN = 128.
2.4. Key derivation (replaces Section 5.2.2)
For newer KDF needs, Tor uses the key derivation function HKDF
from RFC5869, instantiated with SHA256. The generated key
material is:
K = K_1 | K_2 | K_3 | ...
where, if H(x,t) denotes HMAC_SHA256 with value x and key t,
and m_expand denotes an arbitrarily chosen value,
and INT8(i) is an octet with the value "i", then
K_1 = H(m_expand | INT8(1) , KEY_SEED )
and K_(i+1) = H(K_i | m_expand | INT8(i+1) , KEY_SEED ),
in RFC5869's vocabulary, this is HKDF-SHA256 with info ==
m_expand, salt == t_key, and IKM == secret_input.
When used in the ntor handshake a string of key material is
generated and is used in the following way:
Length Purpose Notation
------ ------- --------
HASH_LEN forward digest IV DF *
HASH_LEN backward digest IV DB *
ENC_KEY_LEN encryption key Kf
ENC_KEY_LEN decryption key Kb
DIG_KEY_LEN forward digest key Khf
DIG_KEY_LEN backward digest key Khb
ENC_KEY_LEN forward tweak key Ktf
ENC_KEY_LEN backward tweak key Ktb
DIGEST_LEN nonce to use in the *
hidden service protocol
* I am not sure that we need these any longer.
Excess bytes from K are discarded.
2.6. Ciphers
For hashing(*) we use GHASH with a DIG_KEY_LEN-bit key. We write
this as Digest(K,M) where K is the key and M the message to be
hashed.
We use AES with an ENC_KEY_LEN-bit key. For AES encryption
(resp., decryption) we write E(K,X) (resp., D(K,X)) where K is an
ENC_KEY_LEN-bit key and X the block to be encrypted (resp.,
decrypted).
For a stream cipher, unless otherwise specified, we use
ENC_KEY_LEN-bit AES in counter mode, with a nonce that is
generated as explained below. We write this as Encrypt(K,N,X)
(resp., Decrypt(K,N,X)) where K is the key, N the nonce, and X
the message to be encrypted (resp., decrypted).
(*) The terms hash and digest are used interchangeably.
3. Routing relay cells
3.1. Forward Direction
The forward direction is the direction that CREATE/CREATE2 cells
are sent.
3.1.1. Routing from the Origin
Let n denote the integer representing the destination node. For
I = 1...n+1, T'_{I} is initialized to the 128-bit string consisting
entirely of '0's. When an OP sends a relay cell, they prepare the
cell as follows:
The OP prepares the authentication part of the message:
C_{n+1} = M
T_{n+1} = Digest(Khf_n,T'_{n+1}||C_{n+1})
N_{n+1} = T_{n+1} ^ E(Ktf_n,T_{n+1} ^ 0)
T'_{n+1} = T_{n+1}
Then, the OP prepares the multi-layered encryption:
For I=n...1:
C_I = Encrypt(Kf_I,N_{I+1},C_{I+1})
T_I = Digest(Khf_I,T'_I||C_I)
N_I = T_I ^ E(Ktf_I,T_I ^ N_{I+1})
T'_I = T_I
The OP sends C_1 and N_1 to node 1.
3.1.2. Relaying Forward at Onion Routers
When a forward relay cell is received by OR I, it decrypts the
payload with the stream cipher, as follows:
'Forward' relay cell:
T_I = Digest(Khf_I,T'_I||C_I)
N_{I+1} = T_I ^ D(Ktf_I,T_I ^ N_I)
C_{I+1} = Decrypt(Kf_I,N_{I+1},C_I)
T'_I = T_I
The OR then decides whether it recognizes the relay cell as
described below. If the OR recognizes the cell, it processes the
contents of the relay cell. Otherwise, it passes C_{I+1}||N_{I+1}
along the circuit if the circuit continues.
For more information, see section 4 below.
3.2. Backward Direction
The backward direction is the opposite direction from
CREATE/CREATE2 cells.
3.2.1. Relaying Backward at Onion Routers
When a backward relay cell is received by OR I, it encrypts the
payload with the stream cipher, as follows:
'Backward' relay cell:
T_I = Digest(Khb_I,T'_I||C_{I+1})
N_I = T_I ^ E(Ktb_I,T_I ^ N_{I+1})
C_I = Encrypt(Kb_I,N_I,C_{I+1})
T'_I = T_I
with C_{n+1} = M and N_{n+1}=0. Once encrypted, the node passes
C_I and N_I along the circuit towards the OP.
3.2.2. Routing to the Origin
When a relay cell arrives at an OP, the OP decrypts the payload
with the stream cipher as follows:
OP receives relay cell from node 1:
For I=1...n, where n is the end node on the circuit:
C_{I+1} = Decrypt(Kb_I,N_I,C_I)
T_I = Digest(Khb_I,T'_I||C_{I+1})
N_{I+1} = T_I ^ D(Ktb_I,T_I ^ N_I)
T'_I = T_I
If the payload is recognized (see Section 4.1),
then:
The sending node is I. Stop, process the
payload and authenticate.
4. Application connections and stream management
4.1. Relay cells
Within a circuit, the OP and the end node use the contents of
RELAY packets to tunnel end-to-end commands and TCP connections
("Streams") across circuits. End-to-end commands can be initiated
by either edge; streams are initiated by the OP.
The payload of each unencrypted RELAY cell consists of:
Relay command [1 byte]
'Recognized' [2 bytes]
StreamID [2 bytes]
Length [2 bytes]
Data [PAYLOAD_LEN-23 bytes]
The 'recognized' field is used as a simple indication that the
cell is still encrypted. It is an optimization to avoid
calculating expensive digests for every cell. When sending cells,
the unencrypted 'recognized' MUST be set to zero.
When receiving and decrypting cells the 'recognized' will always
be zero if we're the endpoint that the cell is destined for. For
cells that we should relay, the 'recognized' field will usually
be nonzero, but will accidentally be zero with P=2^-16.
If the cell is recognized, the node moves to verifying the
authenticity of the message as follows(*):
forward direction (executed by the end node):
T_{n+1} = Digest(Khf_n,T'_{n+1}||C_{n+1})
Tag = T_{n+1} ^ D(Ktf_n,T_{n+1} ^ N_{n+1})
T'_{n+1} = T_{n+1}
The message is authenticated (i.e., M = C_{n+1}) if
and only if Tag = 0
backward direction (executed by the OP):
The message is authenticated (i.e., C_{n+1} = M) if
and only if N_{n+1} = 0
The old Digest field is removed since sufficient information for
authentication is now included in the nonce part of the payload.
(*) we should consider dropping the 'recognized' field
altogether and always try to authenticate. Note that this is
an optimization question and the crypto works just as well
either way.
The 'Length' field of a relay cell contains the number of bytes
in the relay payload which contain real payload data. The
remainder of the payload is padding bytes.
4.2. Appending the encrypted nonce and dealing with version-homogenic
and version-heterogenic circuits
When a cell is prepared to be routed from the origin (see Section
3.1.1) the encrypted nonce N is appended to the encrypted cell
(occupying the last 16 bytes of the cell). If the cell is
prepared to be sent to a node supporting the new protocol, S is
combined with other sources to generate the layer's
nonce. Otherwise, if the node only supports the old protocol, n
is still appended to the encrypted cell (so that following nodes
can still recover their nonce), but a synchronized nonce (as per
the old protocol) is used in CTR-mode.
When a cell is sent along the circuit in the 'backward'
direction, nodes supporting the new protocol always assume that
the last 16 bytes of the input are the nonce used by the previous
node, which they process as per Section 3.2.1. If the previous
node also supports the new protocol, these cells are indeed the
nonce. If the previous node only supports the old protocol, these
bytes are either encrypted padding bytes or encrypted data.
5. Security
5.1. Resistance to crypto-tagging attacks
A crypto-tagging attack involves a circuit with two colluding
nodes and at least one honest node between them. The attack works
when one node makes a change to the cell (tagging) in a way that
can be undone by the other colluding party. In between, the
tagged cell is processed by honest nodes which do not detect the
change. The attack is possible due to the malleability property
of CTR-mode: a change to a ciphertext bit effects only the
respective plaintext bit in a predicatble way. This proposal
frustrates the crypto-tagging attack by linking the nonce to the
encrypted message such that any change to the ciphertext results
in a random nonce and hence, random plaintext.
Let us consider the following 3-hop scenario: the entry and end
nodes are malicious and colluding and the middle node is honest.
5.1.1. forward direction
Suppose that node I tags the ciphertext part of the message
(C'_{I+1} != C_{I+1}) then forwards it to the next node (I+1). As
per Section 3.1.2. Node I+1 digests C'_{I+1} to generate T_{I+1}
and N_{I+2}. Since C'_{I+2} is different than it should be, so
are the resulting T_{I+1} and N_{I+2}. Hence, decrypting C'_{I+2}
using these values results in a random string for C_{I+2}. Since
C_{I+2} is now just a random string, it is decrypted into a
random string and cannot be 'recognized' nor
authenticated. Furthermore, since C'_{I+1} is different than what
it should be, T'_{I+1} (i.e., the running digest of the middle
node) is now out of sync with that of the OP, which means that
all future cells sent through this node will decrypt into garbage
(random strings).
Likewise, suppose that instead of tagging the ciphertext, Node I
node tags the encrypted nonce N'_{I+1} != N_{I+1}. Now, when Node
I+1 digests the payload the tweak T_{I+1} is find, but using it
to decrypt N'_{I+1} again results in a random nonce for
N_{I+2}. This random nonce is used to decrypt C_{I+1} into a
random C'_{I+2} which is not recognized by the end node. Since
C_{I+2} is now a random string, the running digest of the end
node is now out of sync, which prevents the end node from
decrypting further cells.
5.1.2. Backward direction
In the backward direction the tagging is done by Node I+2
untagging by the Node I. Suppose first that Node I+2 tags the
ciphertext C_{I+2} and sends it to Node I+1. As per Section
3.2.1, Node I+1 first digests C_{I+2} and uses the resulting
T_{I+1} to generate a nonce N_{I+1}. From this it is clear that
any change introduced by Node I+2 influences the entire payload
and cannot be removed by Node I.
Unlike in Section 5.1.1., the cell is blindly delivered by Node I
to the OP which decrypts it. However, since the payload leaving
the end node was modified, the message cannot be authenticated by
the OP which can be trusted to tear down the circuit.
Suppose now that tagging is done by Node I+2 to the nonce part of
the payload, i.e., N_{I+2}. Since this value is encrypted by Node
I+1 to generate its own nonce N_{I+1}, again, a random nonce is
used which affects the entire keystream of CTR-mode. The cell
again cannot be authenticated by the OP and the circuit is torn
down.
We note that the end node can modify the plain message before
ever encrypting it and this cannot be discovered by the Tor
protocol. This vulnerability is outside the scope of this
proposal and users should always use TLS to make sure that their
application data is encrypted before it enters the Tor network.
5.2. End-to-end authentication
Similar to the old protocol, this proposal only offers end-to-end
authentication rather than per-hop authentication. However,
unlike the old protocol, the ADL-construction is non-malleable
and hence, once a non-authentic message was processed by an
honest node supporting the new protocol, it is effectively
destroyed for all nodes further down the circuit. This is because
the nonce used to de/encrypt all messages is linked to (a digest
of) the payload data.
As a result, while honest nodes cannot detect non-authentic
messages, such nodes still destroy the message thus invalidating
its authentication tag when it is checked by edge nodes. As a
result, security against crypto-tagging attacks is ensured as
long as an honest node supporting the new protocol processes the
message between two dishonest ones.
5.3 The Running Digest
Unlike the old protocol, the running digest is now computed as
the output of a GHASH call instead of a hash function call
(SHA256). Since GHASH does not provide the same type of security
guarantees as SHA256, it is worth discussing why security is not
lost from computing the running digest differently.
The running digets is used to ensure that if the same payload is
encrypted twice, then the resulting ciphertext does not remain
the same. Therefore, all that is needed is that the digest should
repeat with low probability. GHASH is a universal hash function,
hence it gives such a guarantee assuming its key is chosen
uniformly at random.
Hi Nick, George, David,
(I'm sending this email to tor-dev so everyone knows how Core Tor
merges are going.)
Mainline Mergers
David is back from leave, so I'm going to stop doing mainline merges.
But please let me know if there's a merge I can help with.
(Email or Signal is best, IRC has a lot of backlog.)
Do we need to do a handover some time?
The next team meeting might be a good time.
Mainline Merge Ready Tickets
I moved my mainline merge trac wiki queries to this page:
https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/Mainlin…
That page should show all of the mainline merge_ready tickets, sorted
by owner and reviewer. Your name is in bold, so you can work out which
tickets you should merge. (We want 3 people to look at every ticket
before it merges, except for trivial changes.)
Here is our full list of task tracking wiki pages:
https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam#TaskTra…
When does 0.4.0 stop being mainline?
It looks like people aren't merging backports to 0.4.0 any more.
That's probably a good idea: we should minimise release candidate changes.
When should I start doing 0.4.0 merges as part of the backports?
Backport Status
We released 0.4.0.4-rc last week, so I'm going to backport some
low-risk changes to 0.2.9 and later. Most of these changes have been
tested in 0.4.0.3-alpha.
I should be able to do the backports tomorrow or Tuesday.
Here are the backports for the next few days:
https://trac.torproject.org/projects/tor/wiki/user/teor#Backports:0.5dayspe…
Here are the backports I will do after I get back from my leave in May:
https://trac.torproject.org/projects/tor/wiki/user/teor/HiddenBackports
T
--
teor
----------------------------------------------------------------------
Hello list,
This is a thread summarizing and brainstorming various defences about denial of
service defences for onion services after an in-depth discussion with David Goulet.
We've been thinking about denial of service defences for onion services
lately. This has been a recurrent topic that has been creeping up every once in
a while: Last time we had to tackle this issue it was back in early 2018 when
we had to design a DoS mitigation subsystem because the network was crumbling
down (https://trac.torproject.org/projects/tor/ticket/24902).
Unfortunately, while the DoS mitigation subsystem improved the health of the
network and stopped the DoS attacks back then, it did not address the total
space of possible attacks, and onion services and the network is still open to
various attacks. The main DoS attack right now is the naive attack of flooding
the service with too many introduction requests, and this is the attack that
this post is gonna be dealing with.
We don't like DoS attacks because they cause two issues to Tor:
a) They damage the health of the Tor network impacting every user
b) They kill availability of legitimate onion services.
In this thread we will handle these two issues independently, as there is no
single solution that improves both areas at once. We have some pretty good
ideas on (a), but we would appreciate ideas on (b), so feel free to give us
your input.
== a) Minimizing the damage to the network caused by DoS attacks:
Most of the damage caused during DoS attacks is from the circuits created by
the attacker to introduce/rendezvous to the victim onion service, and also
by the circuits created by the victim onion service as it tries to
rendezvous with all those clients. An attacker can literally create tens of
thousands of introduction circuits in less than a minute, which get
amplified by the service launching that many rendezvous circuits. Not good.
Here are a few ways to reduce the damage to the network:
== 1) Rate limiting introduction circuits
There should be a way to rate-limit introductions so that services do not
get overwhelmed. There are various places where we can rate-limit: we
could rate-limit on the guard-layer, or on the intro-point layer or on
the service-layer.
We have already attempted at rate-limiting on the guard-layer with
#24902, but it's hard to go deeper there because the guard does not know
if the circuit is a DoS attacker, or a busy onion service, or 150 Tor
users in an airport. We also think that rate-limiting on the
service-layer won't do much good since that's too far down the circuit,
and we are trying to reduce the operations it has to do so that it
doesn't get overwhelmed (see #15463 for various queue-management
approaches for rate-limiting on the service side).
So we've been thinking of rate-limiting on the introduction point layer,
since it's a nice soaking point that does not do much right now. See
#15516 (comment 28) for a concrete proposal by arma which results in far
less damage to the network (since evil traffic does not get carried
through to the service-side introduction circuit, and no extra rendezvous
circuits get launched), and also a swifter way for legit clients to know
that an onion-service circuit won't work.
== 2) Stop needless circuit rotation on service-side
Right now, services will rotate their introduction circuits after a
certain number of introductions (#26294). This means that during an
attack, the service not only needs to handle thousands of fake
introduction circuits, but also continuously tear down and recreate
introduction circuits and publish new descriptors. See comment 8 on that
ticket for a short-term proposal on how to improve the situation here,
by not continuously rotating introduction points.
== 3) Optimize CPU performance on the service-side
Right now, onion services during an attack are actually CPU bound. See
#30221 for various improvements we can do to improve the performance of
services. However, improving CPU performance might have the opposite effect,
since processing cells quicker means that the service will make even more
rendezvous circuits.
== 4) Make sure attackers don't take shortcuts around the protocol
We should make sure that attackers don't take shortcuts around the Tor
protocol to launch their attacks. Examples here involve requiring a
proof-of-rendezvous from clients (#25066), and not allowing single-hop
proxies to do introductions (#22689).
The above suggestions (maybe in priority order) are ways we can improve the
damage dealt to the network by DoS attackers. But that still does not make
DoS attacks less effective. So here follows the section about improving
service availability:
== b) Improve service availability during DoS attacks
Unfortunately, it's really hard to accurately stop DoS attacks in the Tor
protocol. There is just no good way to distinguish between innocent clients
trying to access content, and a bad actor trying to disable an onion service.
Here is the main way we've thought of addressing this issue:
== 1) Binding the application-layer with the Tor introduction-layer
We think that the Tor protocol layer might not be the right place for
handling DoS attacks. There are literally million-dollar companies trying
hard to tackle this issue on the application-layer, where it's easier
since you can do machine learning, give out captchas, zone out users,
etc. And that's why we think that the solution to this issue lies on the
application-layer and not on the Tor protocol layer.
In particular, a plausible solution here might involve for the client to
embed application-layer information (e.g. a username/password) in its
INTRODUCE1 cell, which then gets passed to the service. The service, can
then check whether the given username/password should be allowed to
connect (see "rendezvous approver" concept at #16059), and allow or reject
the connection as it wishes. This way onion service operators can have
complicated application-layer software that analyzes the activity of users
and decide whether users should be allowed in or not (based on the number
of introductions, or their application-layer (web) activity).
+===========================================+
| Tor network |
+===========================================+
^ ^
| +-----+ |
+-------->| Tor |-------------------+
INTRO2 | HS | rendezvous circuit
with +-----+ only if approved
user/pass ^
|
|
v
+----------+ +-------+
|Rendezvous|<------->|sqlite?|
|approver | +-------+
+----------+
We think that this is a solution that could allow onion services to
continue existing under high-load scenarios, since no rendezvous circuits
would be established during DoS scenarios (and we know that rendezvous
circuits is what causes the most CPU/network/availability damage).
However, this is a very complicated solution from an engineering
perspective given that it requires changes on the client-side (to enhance
INTRO1 cells with application-layer data), and also involves various
enhancements on the service-side (various control port commands to
interact with the (nonexistent) "rendezvous approver" software, which in
turn needs to interact with other application-layer software (e.g. sql
databases to manage membership).
There is also serious UX concerns with how this would look like on the
client-side? Also, how does this interact with client auth? And how does
this interact with intro-point-level rate limiting proposed above
(onions should be given the option to disable intro-layer rate limiting)?
How is this related to #17254?
All in all, we feel like we have pretty good options for reducing the
damage that DoS attacks cause on our network, but we are still lacking
easy and practical solutions for ensuring availability of onion services
that are under DoS. For the next months, we plan to focus on reducing
the damage on the network, since the damage on the network has a
cummulative effect as circuits fail and get endlessly retried, where
nothing ends up working right. At the same time, we will be thinking of
good solutions for keeping a high availability on services that receive
DoS attacks.
We would love your feedback and suggestions.
Thanks!
Hi all,
We finished our first working version of sbws in March, and deployed
it to a directory authority. We're now working on deploying it to a
few more directory authorities:
https://trac.torproject.org/projects/tor/ticket/29290
We're also working on archiving and analysing the bandwidth files
produced by sbws and Torflow:
https://trac.torproject.org/projects/tor/ticket/21378
During this work, we've discovered some missing sbws features:
https://trac.torproject.org/projects/tor/ticket/30255
We need a better process for proposing and reviewing sbws changes.
At the moment, I am spending a lot of time reviewing and designing
sbws changes. And that's not sustainable. We need a process that works
when I go on leave, or get busy with other tasks.
I suggest that we use the tor proposals process:
https://gitweb.torproject.org/torspec.git/tree/proposals/001-process.txt
We can submit small changes as diffs to the bandwidth file spec:
https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt
But large changes, controversial changes, and changes with multiple
competing options should have their own proposal. Then, once we decide
what to do, we can integrate those changes into the spec.
T
--
teor
----------------------------------------------------------------------
Hi,
Nick asked me to send a status email about PrivCount, before I go on
leave for a few weeks.
Plan
We want to add the following counters to a PrivCount Proof of Concept:
* check counters (zero, relay count, time in seconds)
* consumed bandwidth
Nick also suggested adding connection counts. That seems like a good
counter, but we want to make sure we do bandwidth in the first release,
because it's a high-risk statistic.
Status
In March and April, I deferred PrivCount tasks to work on chutney for
one of our other sponsors.
I also delayed these tasks, because I was waiting for #29017 and #29018
to merge:
* #29017 PaddingStatistics should be disabled when ExtraInfoStatistics is 0
* #29018 Make all statistics depend on ExtraInfoStatistics
Tickets
The top-level ticket is:
PrivCount proof of concept with existing statistics
https://trac.torproject.org/projects/tor/ticket/27908
I was mainly working on code for these tickets:
PrivCount proof of concept: implement check counters
https://trac.torproject.org/projects/tor/ticket/29004
PrivCount proof of concept: implement consumed bandwidth counters
https://trac.torproject.org/projects/tor/ticket/29005
Make relays report bandwidth usage more often in test networks
https://trac.torproject.org/projects/tor/ticket/29019
Code
I have incomplete branches for #29004, #29005, and #29019 here:
https://github.com/teor2345/tor/tree/ticket29004-wiphttps://github.com/teor2345/tor/tree/ticket29005https://github.com/teor2345/tor/tree/ticket29019
I think all the necessary code is present in these branches.
(But maybe it's not???)
But it needs some cleanup:
* rebase on to the current master,
* put the commits on the right branches
* make sure it does what these tickets say it should do
I'm happy to do that after I come back from leave.
I am also happy if Nick wants to clean up this code.
See also my previous email about BridgeDB and PrivCount. Maybe we can
save ourselves some effort by using PrivCount's obfuscation on
BridgeDB's statistics.
T
--
teor
----------------------------------------------------------------------