Greetings tor-dev!
This email is a discussion on adding tracing to little-t tor. Tracing can be a very abstract notion sometimes so I'll start with a quick overview of what that is, what we can achieve and use cases within tor. Then I'll go over a last point which is safety.
This email doesn't go into the technical details of userspace tracing on how and what will be done to add it to tor. That is for another discussion.
1. Overview
Long story short, you can see tracing as a specific type of logging as in it records information of the application at runtime using tracepoints (similar to logging statement) so it can be used later. But the main differences from logging are in two parts: performance and API stability.
Usually, tracing implies high performance as in adds very little overhead to the application in order to disrupt as little as possible the normal behavior of an application. This is extremely useful in cases where you want to catch race conditions or performance bottle necks.
Tracers in userspace have usually an "inprocess library" which in short means that it records data (raw) from the application and move it to an outside buffer. Then, that buffer is emptied either on disk or network by the outside component of the tracer for which the data can be analyzed after collection.
So all a tracer do is, within the application when a tracepoint is hit, copy some data into a buffer and yields back to the application.
The other part is the API stability. Very often, logs (let say at DEBUG level) don't usually have strict stable requirements between released versions. But tracing events (tracepoints), are exposed to the outside for tracers to hook on, and for people to run analyzing tools on the recorded data. Thus, stability is usually strongly encouraged. In other words, what the tracepoint exposes, once released stable, should really not change that much over time.
With a proper abstraction in the application, we can offer stable tracepoints for which a variety of tracers can hook themselves on at runtime. It is all about providing an interface to the outside world.
2. Why Tracing in Tor
The tor software is a very complex beast. It has dozens of subsystems with various interactions between them. One of the big main job of tor is to relay data as fast as possible in order to keep the latency low. Which means, that there are code paths that are considered "fast path" implying that they must remain light and fast. One example is the crypto code that is hit at each cell.
Tracing comes in extremely handy to hunt down race conditions, performance issues, or even multithreading problems. A fast relay, let say 25MB/s, if we wanted to record cell timing in order to hunt down such issues, it simply can _not_ be done with logging at debug level since it slows down considerably tor but also fills the disk in a matter of minutes.
And using the control port is not a good solution for two main reasons: string formatting at each event and control port is part of the mainloop. So anything you ask to go on the control port will add an overhead to the overall behavior of tor which is not good when you hunt down races.
One concrete example where tracing was used in the past in tor is with the rewrite of the cell scheduler (KIST). In order to measure cell timings within Tor so bottlenecks issues could be found, tracing had to be added so millions of events could be recorded within few minutes of using a fast relay in production.
In pressure situation, this is where tracing comes handy. Tracing was also used recently to find onion service v3 reachability issues. In order to correlate connection, cell and circuit level problems with the higher level HS subsystem, we were able to record events in all those subsystems, match them with their precise timing (offered by tracing) and analyze the results later on after recording the data.
3. Safety Discussion
Onto the last part I wanted to raise. Allowing anyone to record very low level data from tor, there is an obvious safety question that must be raised.
Over the years, I've talked about tracing with many people in Tor and the consensus was always that it should never be enabled in production. As in, the packages shipped by Tor or by distros should _never_ build the tracepoints.
In other words, it should be considered a development option only. Not only an option, but compiled _out_ in production and one has to explicitly build them into tor.
For example (nothing final, just to show the idea):
$ ./configure --enable-tracing
I personally think that should be enough since the presence of the code upstream won't stop people from using it (bad or not) but we can prevent it to be in any legit Tor packages out there. See it a bit like the obsolete Tor2web option that was never enabled in any published packages by Tor Project or distros, one had to explicitly enable it at configure time.
The ControlPort is allowed in production and if a malicious actor gets access to it, then game over. I do see tracing like that as well but at least we can control its availability as a feature where we can't for the ControlPort as of today.
Any feedback is very welcome! Concerns, questions, thoughts.
Cheers! David
Hello tor-devs,
I am currently working on a DoS mitigation system aiming to protect the availability of onion services flooded with INTRO2 cells. My idea is using a (Privacy Pass like) token based approach as suggested in https://trac.torproject.org/projects/tor/ticket/31223#comment:6
For the evaluation of a first prototype I would like to compare CPU usage times at the onion service when a) launching a rendezvous circuit and b) validating a (potentially invalid) token. Is there an easy way, to measure the CPU time a service spends for all operations triggered when launching a new rendezvous circuit? Has somebody done that before? Basically, I want to measure how much CPU time we save, if we do not launch the rendezvous circuit. So far I have identified the following functions: launch_rendezvous_point_circuit() and service_rendezvous_circ_has_opened(). I understand that there is more operations involved for building new circuits, since circuits are built hop by hop. How can I identify all relevant functions triggered after launching the rendezvous circuit and include them in my measurements?
Once I have some reliable results I will provide you with more information on what I am doing and how it is working so far.
Cheers Valentin
This is my first post on this list :-). So have mercy, if I overlooked resources to answer my question. Also, I am only beginning to familiarize myself with the existing code base.
On 13 Jan (13:39:37), Valentin Franck wrote:
Hello tor-devs,
Hi Valentin!
I am currently working on a DoS mitigation system aiming to protect the availability of onion services flooded with INTRO2 cells. My idea is using a (Privacy Pass like) token based approach as suggested in https://trac.torproject.org/projects/tor/ticket/31223#comment:6
Do _please_ talk to asn here as he is also doing research on this.
For the evaluation of a first prototype I would like to compare CPU usage times at the onion service when a) launching a rendezvous circuit and b) validating a (potentially invalid) token. Is there an easy way, to measure the CPU time a service spends for all operations triggered when launching a new rendezvous circuit? Has somebody done that before? Basically, I want to measure how much CPU time we save, if we do not launch the rendezvous circuit. So far I have identified the following functions: launch_rendezvous_point_circuit() and service_rendezvous_circ_has_opened(). I understand that there is more operations involved for building new circuits, since circuits are built hop by hop. How can I identify all relevant functions triggered after launching the rendezvous circuit and include them in my measurements?
I do use a pretty extensive tracing patchset on "tor" to measure the hidden service timings so all this work is done, just not upstream yet...
But it turns out that I'm currently actively working on the tracing API and adding tracepoints to tor for upstream merge in the coming month.
If you can wait that long, you might have it all in tor soon else I can point you to the branch but will require some work on your side to make it work with a specific trace I use (LTTng userspace).
But at least you can see where the tracepoints are in the code:
https://gitweb.torproject.org/user/dgoulet/tor.git/tree/src/lib/trace/lttng/...
Most tracepoints are client side for the HS. For service, to track the timings, I use the circuit tracepoint. Just grep where they are put in the code.
Hope this help a bit until we have tracing upstream.
Cheers! David
Hi, thanks for working on it.
At first I thought about using a PoW on the Introduction Point (I.P.) side.
Maybe a dynamic PoW? I mean only ask for PoW under load (Hidden services sets the INTRO1s/second on the I.P.) or ask for every new circuit.
Then I thought that we need to fix the Rendezvous verification issue too. We do not verify if the client/user/attacker actually opened a circuit to the Rend point. And I thought we could make the Rend sing a message and the I.P verify the signature before sending the INTRO2 to the HS.
But now I think we need to merge designs and make just one proposal fixing both problems at the same time.
If we don't want to make a PoW for every new circuit, we could make the client generate a private Identity (KeyPair) mixed with some sort of PoW, generating it for every HS a client want to connect. This way we only make PoW for each onion and the IP can have a replay cache (or something like that) with each identity and the last time it requested a new circuit. We can better control with this way the number of individual clients and we "save the planet" by not making a PoW for each new circuit. (Maybe this approach is what your are working at with the "token based approach").
Sorry for my english...
El 13/1/20 a las 13:39, Valentin Franck escribió:
Hello tor-devs,
I am currently working on a DoS mitigation system aiming to protect the availability of onion services flooded with INTRO2 cells. My idea is using a (Privacy Pass like) token based approach as suggested in https://trac.torproject.org/projects/tor/ticket/31223#comment:6
For the evaluation of a first prototype I would like to compare CPU usage times at the onion service when a) launching a rendezvous circuit and b) validating a (potentially invalid) token. Is there an easy way, to measure the CPU time a service spends for all operations triggered when launching a new rendezvous circuit? Has somebody done that before? Basically, I want to measure how much CPU time we save, if we do not launch the rendezvous circuit. So far I have identified the following functions: launch_rendezvous_point_circuit() and service_rendezvous_circ_has_opened(). I understand that there is more operations involved for building new circuits, since circuits are built hop by hop. How can I identify all relevant functions triggered after launching the rendezvous circuit and include them in my measurements?
Once I have some reliable results I will provide you with more information on what I am doing and how it is working so far.
Cheers Valentin
This is my first post on this list :-). So have mercy, if I overlooked resources to answer my question. Also, I am only beginning to familiarize myself with the existing code base.
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Hello,
thanks for you mail. my comments are below.
On 1/15/20 6:17 PM, juanjo wrote:
Hi, thanks for working on it.
At first I thought about using a PoW on the Introduction Point (I.P.) side.
Maybe a dynamic PoW? I mean only ask for PoW under load (Hidden services sets the INTRO1s/second on the I.P.) or ask for every new circuit. From what I know this would most likely require a more than 1-RTT
(interactive) introduction protocol. I think we definitely want to avoid that.
Then I thought that we need to fix the Rendezvous verification issue too. We do not verify if the client/user/attacker actually opened a circuit to the Rend point. And I thought we could make the Rend sing a message and the I.P verify the signature before sending the INTRO2 to the HS.
Such a signature would have to be zero knowledge. Otherwise we would leak the chosen RP to the IP and make deanonymization more likely. Designing an unforgeable proof of rendezvous is non-trivial (even if it is was verified by the service and not the IP), because we have to assume that adversaries run their own relays. Therefore they could most likely precompute/forge these proofs of rendezvous. Of course, we could try and cache which RPs have been used, but this seems a lot of work for relatively little security benefits. I doubt that this approach is suitable to discourage resourceful adversaries from DoSing services.
But now I think we need to merge designs and make just one proposal fixing both problems at the same time.
If we don't want to make a PoW for every new circuit, we could make the client generate a private Identity (KeyPair) mixed with some sort of PoW, generating it for every HS a client want to connect. This way we only make PoW for each onion and the IP can have a replay cache (or something like that) with each identity and the last time it requested a new circuit. We can better control with this way the number of individual clients and we "save the planet" by not making a PoW for each new circuit. (Maybe this approach is what your are working at with the "token based approach").
I believe, we should avoid making different connections to an onion service by the same user linkable by the IP/service as this would turn anonymity into pseudonymity and therefore ease user identification. Of course, there are crypto schemes that allow us to use anonymous credentials/ authentication. Unfortunately, I am not sure how such anonymous credentials could be useful for DoS mitigation. We would somehow need a mechanism to detect misbehavior and revoke the credentials or limit their use to a certain number of authentications. I am not sure, how this can be done if different authentications using the same credentials are truely unlinkable.
Sorry for my english...
El 13/1/20 a las 13:39, Valentin Franck escribió:
Hello tor-devs,
I am currently working on a DoS mitigation system aiming to protect the availability of onion services flooded with INTRO2 cells. My idea is using a (Privacy Pass like) token based approach as suggested in https://trac.torproject.org/projects/tor/ticket/31223#comment:6
For the evaluation of a first prototype I would like to compare CPU usage times at the onion service when a) launching a rendezvous circuit and b) validating a (potentially invalid) token. Is there an easy way, to measure the CPU time a service spends for all operations triggered when launching a new rendezvous circuit? Has somebody done that before? Basically, I want to measure how much CPU time we save, if we do not launch the rendezvous circuit. So far I have identified the following functions: launch_rendezvous_point_circuit() and service_rendezvous_circ_has_opened(). I understand that there is more operations involved for building new circuits, since circuits are built hop by hop. How can I identify all relevant functions triggered after launching the rendezvous circuit and include them in my measurements?
Once I have some reliable results I will provide you with more information on what I am doing and how it is working so far.
Cheers Valentin
This is my first post on this list :-). So have mercy, if I overlooked resources to answer my question. Also, I am only beginning to familiarize myself with the existing code base.
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev