Hello,
Fernando Fernández Mancera wrote: [...]
Motivation:
Currently Tor users are reusing a given circuit for ten minutes (by default) after it's first used. This time is too long because a malicious Exit relay can trace a user's pseudonymous profile, especially if connections from multiple protocols are put on the same circuit.
Interesting proposal.
Please see this: https://lists.torproject.org/pipermail/tor-dev/2014-September/007517.html
And especially this reply: https://lists.torproject.org/pipermail/tor-dev/2014-September/007518.html
It would be very nice if you could glue these into this proposal as well.
This time it is established on MaxCircuitDirtiness parameter and by default its value is ten minutes.
I have been thinking in a way to fix this. The first idea that came to my mind was to use StreamIsolationByHost and StreamIsolationByPort on it, but I wasn't able to sort it out.
One day, I thought "Why is time so important?" and later on I realized that maybe focusing on the amount of bytes running through the circuit could end up being a better approach on this problem.
It makes sense, but the hardest thing here is coming with the right amount of bytes value so we won't end up building on average more circuits than we were with the 10 minutes time dirtiness configuration.
Design:
I propose two options to reduce this problem, both based on taking into account the amount of bytes running through a circuit.
MaxCircuitSizeDirtiness (temporal parameter name) will take an integer field that is contained on an interval and represents the maximum amount of bytes that can be written/read (we need to discuss about the use of one for both) by the circuit. If the circuit exceeds that amount, new streams won't use this circuit anymore.
MaxCircuitSizeDirtinessByPort (temporal parameter name) will take an array of integers that are contained on an interval and represents the maximum amount of bytes that can be written/read (we need to discuss about the use of one for both) by the circuit per port (StreamIsolationByPort). This array is parallel to the array of ports from StreamIsolationByPort. If the circuit exceeds that amount, new streams won't use this circuit anymore.
I just want to understand something clear so pointing it out, because it reads "maximum amount of bytes...": the MaxCircuitSizeDirtiness counter will count bytes read/write on a circuit by all streams attached to that circuit, and as soon as threshold is reached the circuit will be marked as dirty, but the circuit will be closed when all streams are idle, correct? Like
If MaxCircuitSizeDirtiness is 10000 bytes (just for example) I could download a file of 150000 bytes from a destination host:36455 over the same circuit, via a single stream, and as soon as that stream is idle, count 150000 > 10000 --> mark circuit dirty, stop attaching new streams to it? I think you thought of this but want to confirm.
Regarding default values it would be useful to set up one a bit lower than the average amount of bytes per circuit. On MaxCircuitSizeDirtinessByPort after discuss it we shouldn't set up a default value because someone can identify the port used. About MaxCircuitDirtiness, if the others are set up by default it could be bigger, like thirty minutes, so if the user doesn't send/receive a significant amount of data the circuit will be changed anyway.
Security Implications:
It is believed that the proposed changes will improve the anonymity for end users. The end user won't reuse a given circuit if they have sent a considerable amount of bytes, thus making more difficult for malicious Exit relays to be able to trace a user's pseudonymous profile.
Obviously this is a probability, of course it's possible that sensitive data will leak in a little amount of data but it's more even possible that sensitive data will leak in a large amount.
Specification:
In order to implement this feature we will need to add some new functionalities. We need to parse MaxCircuitSizeDirtiness and MaxCircuitSizeDirtinessByPort from the torrc config file. We need to create a function or improve one to check the amount of bytes that are running through the circuit and if this amount is higher than the established value, consider the circuit dirty.
Compatibility:
The proposed changes should not create any compatibility issues. New Tor clients will be able to take advantage of this without any modification to the network.
Implementation:
It is proposed that MaxCircuitSizeDirtiness will be enabled by default and also increase MaxCircuitDirtiness to thirty minutes.
It is proposed that MaxCircuitSizeDirtinessByPort won't be enabled by default for port 22, 53, and port 80 as StreamIsolationByPort.
About TorBrowser or any other Tor application that is able to manage circuits by its own because of KeepAliveIsolateSOCKSAuth option being active by default shouldn't be affected by this new feature. As the same form that it currently ignores MaxCircuitDirtiness parameter.
Performance and scalability notes:
The proposed changes will reduce Tor network stress as users who do not exceed the set amount will reduce circuit generation by three (if default MaxCircuitDirtinesss value is thirty minutes).
I want to work on demonstrating that by a research but first it's nice to get the idea accepted.
Looking forward to see some preliminary statistics. Please think of this threat model:
- an attacker discovers the location of an onion service. He hijacks it, but does not alter the content or the functionality of the service, in order not to signal the discovery to its users and further deanonymize as many users as he can.
- he starts serving useless noise on that services so the bytes limit threshold is reached fast, forcing users to build circuits more often than the time dirtiness method.
- the attacker also controls some hostile relays that could potentially be picked in any new created circuit.
Do we increase the success changes for these kind of attackers? make it harder for them ? Or does the game remain unchanged?
The time dirtiness cannot be gamed by the attacker because it is based just on user's action (or inaction), but the bytes sent over the circuit can. That is why it is there in the first place.
If this sounds bad, here are two crazy ideas you could take into consideration when doing the numbers: 1. Put a hard limit per stream as in how much a single stream can count to the MaxCircuitSizeDirtiness threshold in terms of %, if that is exceeded the bytes in excess are simply discounted, so it becomes impossible for a circuit to be marked dirty "by size" just by being used by a single stream. 2. Make MaxCircuitSizeDirtiness random, a value between n and m, where m cannot be greater than n * q, for *each* circuit. 3. Both 1. and 2.?
Either way, thanks for working on this - this area needs some attention and I am sure we can do better.