Filename: xxx-flashflow.txt Title: FlashFlow: A Secure Speed Test for Tor (Parent Proposal) Author: Matthew Traudt, Aaron Johnson, Rob Jansen, Mike Perry Created: 23 April 2020 Status: Draft
1. Introduction
FlashFlow is a new distributed bandwidth measurement system for Tor that consists of a single authority node ("coordinator") instructing one or more measurement nodes ("measurers") when and how to measure Tor relays. A measurement consists of the following steps:
1. The measurement nodes demonstrate to the target relay permission to perform measurements. 2. The measurement nodes open many TCP connections to the target relay and create a one-hop circuit to the target relay on each one. 3. For 30 seconds the measurement nodes send measurement cells to the target relay and verify that the cells echoed back match the ones sent. During this time the relay caps the amount of background traffic it transfers. Background and measurement traffic are handled separately at the relay. Measurement traffic counts towards all the standard existing relay statistics. 4. For every second during the measurement, the measurement nodes report to the authority node how much traffic was echoed back. The target relay also reports the amount of per-second background (non-measurement) traffic. 5. The authority node sums the per-second reported throughputs into 30 sums (one for each second) and calculates the median. This is the estimated capacity of the relay.
FlashFlow performs a measurement of every relay according to a schedule described later in this document. Periodically it produces relay capacity estimates in the form of a v3bw file, which is suitable for direct consumption by a Tor directory authority. Alternatively an existing load balancing system such as Simple Bandwidth Scanner could be modified to use FlashFlow's v3bw file as input.
It is envisioned that each directory authority that wants to use FlashFlow will run their own FlashFlow deployment consisting of a coordinator that they run and one or more measurers that they trust (e.g. because they run them themselves), similar to how each runs their own Torflow/sbws. Section 5.2 of this proposal describes long term plans involving multiple FlashFlow deployments.
FlashFlow is more performant than Torflow: FlashFlow takes 5 hours to measure the entire existing Tor network from scratch (with 3 Gbit/s measurer capacity) while Torflow takes 2 days; FlashFlow measures relays it hasn't seen recently as soon as it learns about them (i.e. every new consensus) while Torflow can take a day or more; and FlashFlow accurately measures new high-capacity relays the first time and every time while Torflow takes days/weeks to assign them their full fair share of bandwidth (especially for non-exits). FlashFlow is more secure than Torflow: FlashFlow allows a relay to inflate its measured capacity by up to 1.33x (configured by a parameter) while Torflow allows weight inflation by a factor of 89x [0] or even 177x [1].
After an overview in section 2 of the planned deployment stages, section 3, 4, and 5 discuss the short, medium, and long term deployment plans in more detail.
2. Deployment Stages
FlashFlow's deployment shall be broken up into three stages.
In the short term we will implement a working FlashFlow measurement system. This requires code changes in little-t tor and an external FlashFlow codebase. The majority of the implementation work will be done in the short term, and the product is a complete FlashFlow measurement system. Remaining pieces (e.g. better authentication) are added later for enhanced security and network performance.
In the medium term we will begin collecting data with a FlashFlow deployment. The intermediate results and v3bw files produced will be made available (semi?) publicly for study.
In the long term experiments will be performed to study ways of using FF v3bw files to improve load balancing. Two examples: (1) using FF v3bw files instead of sbws's (and eventually phasing out torflow/sbws), and (2) continuing to run sbws but use FF's results as a better estimate of relay capacity than observed bandwidth. Authentication and other FlashFlow features necessary to make it completely ready for full production deployment will be worked on during this long term phase.
3. FlashFlow measurement system: Short term
The core measurement mechanics will be implemented in little-t tor, but a separate codebase for the FlashFlow side of the measurement system will also be created. This section is divided into three parts: first a discussion of changes/additions that logically reside entirely within tor (essentially: relay-side modifications), second a discussion of the separate FlashFlow code that also requires some amount of tor changes (essentially: measurer-side and coordinator-side modifications), and third a security discussion.
3.1 Little-T Tor Components
The primary additions/changes that entirely reside within tor on the relay side:
- New torrc options/consensus parameters. - New cell commands. - Pre-measurement handshaking (with a simplified authentication scheme). - Measurement mode, during which the relay will echo traffic with measurers, set a cap on the amount of background traffic it transfers, and report the amount of transferred background traffic.
3.1.1 Parameters
FlashFlow will require some consensus parameters/torrc options. Each has some default value if nothing is specified; the consensus parameter overrides this default value; the torrc option overrides both.
FFMeasurementsAllowed: A global toggle on whether or not to allow measurements. Even if all other settings would allow a measurement, if this is turned off, then no measurement is allowed. Possible values: 0, 1. Default: 0 (disallowed).
FFAllowedCoordinators: The list of coordinator TLS certificate fingerprints that are allowed to start measurements. Relays check their torrc when they receive a connection from a FlashFlow coordinator to see if it's on the list. If they have no list, they check the consensus parameter. If nether exist, then no FlashFlow deployment is allowed to measure this relay. Default: empty list.
FFMeasurementPeriod: A relay should expect on average, to be measured by each FlashFlow deployment once each measurement period. A relay will not allow itself to be measured more than twice by a FlashFlow deployment in any time window of this length. Relays should not change this option unless they really know what they're doing. Changing it at the relay will not change how often FlashFlow will attempt to measure the relay. Possible values are in the range [1 hour, 1 month] inclusive. Default: 1 day.
FFBackgroundTrafficPercent: The maximum amount of regular non-measurement traffic a relay should handle while being measured, as a percent of total traffic (measurement + non-measurement). This parameter is a trade off between having to limit background traffic and limiting how much a relay can inflate its result by handling no background traffic but reporting that it has done so. Possible values are in the range [0, 99] inclusive. Default: 25 (a maximum inflation factor of 1.33).
FFMaxMeasurementDuration: The maximum amount of time, in seconds, that is allowed to pass from the moment the relay is notified that a measurement will begin soon and the end of the measurement. If this amount of time passes, the relay shall close all measurement connections and exit its measurement mode. Note this duration includes handshake time, thus it necessarily is larger than the expected actual measurement duration. Possible values are in the range [10, 120] inclusive. Default: 45.
3.1.2 New Cell Types
FlashFlow will introduce a new cell command MEASURE.
The payload of each MEASURE cell consists of:
Measure command [1 byte] Length [2 bytes] Data [Length-3 bytes]
The measure commands are:
0 -- MSM_PARAMS [forward] 1 -- MSM_PARAMS_OK [backward] 2 -- MSM_ECHO [forward and backward] 3 -- MSM_BG [backward] 4 -- MSM_ERR [forward and backward]
Forward cells are sent from the measurer/coordinator to the relay. Backward cells are sent from the relay to the measurer/coordinator.
MSM_PARAMS and MSM_PARAMS_OK are used during the pre-measurement stage to tell the target what to expect and for the relay to positively acknowledge the message. MSM_ECHO cells are the measurement traffic; the measurer generates them, sends them to the target, and the target echos them back. The target send a MSM_BG cell once per second to report the amount of background traffic it is handling. MSM_ERR cells are used to signal to the other party that there has been some sort of problem and that the measurement should be aborted. These measure commands are described in more detail in the next section.
The only cell that sometimes undergoes cell encryption is MSM_ECHO; no other cell ever gets cell encrypted. (All cells are transmitted on a regular TLS-wrapped OR connection; that encryption still exists.)
The relay "decrypts" MSM_ECHO cells before sending them back to the measurer; this mirrors the way relays decrypt/encrypt RELAY_DATA cells in order to induce realistic cryptographic CPU load. The measurer usually skips encrypting MSM_ECHO cells to reduce its own CPU load; however, to verify the relay is actually correctly decrypting all cells, the measurer will choose random outgoing cells, encrypt them, remember the ciphertext, and verify the corresponding incoming cell matches.
3.1.3 Pre-Measurement Handshaking/Starting a Measurement
The coordinator connects to the target relay and sends it a MSM_PARAMS cell. If the target is unwilling to be measured at this time or if the coordinator didn't use a TLS certificate that the target trusts, it responds with an error cell and closes the connection. Otherwise it checks that the parameters of the measurement are acceptable (e.g. the version is acceptable, the duration isn't too long, etc.). If the target is happy, it sends a MSM_PARAMS_OK, otherwise it sends a MSM_ERR and closes the connection.
Upon learning the IP addresses of the measurers from the coordinator in the MSM_PARAMS cell, the target whitelists their IPs in its DoS detection subsystem until the measurement ends (successfully or otherwise), at which point the whitelist is cleared.
Upon receiving a MSM_PARAMS_OK from the target, the coordinator will instruct the measurers to open their TCP connections with the target. If the coordinator or any measurer receives a MSM_ERR, it reports the error to the coordinator and considers the measurement a failure. It is also a failure if any measurer is unable to open at least half of its TCP connections with the target.
The payload of MSM_PARAMS cells [XXX more may need to be added]:
- version [1 byte] - msm_duration [1 byte] - num_measurers [1 byte] - measurer_info [num_measurers times] - ipv4_addr [4 bytes] - num_conns [2 bytes]
version dictates how this MSM_PARAMS cell shall be parsed. msm_duration is the duration, in seconds, that the actual measurement will last. num_measurers is how many measurer_info structs follow. For each measurer, the ipv4_addr it will use when connecting to the target is provided, as is num_conns, the number of TCP connections that measurer will open with the target. Future versions of FlashFlow and MSM_PARAMS will use TLS certificates instead of IP addresses.
MSM_PARAMS_OK has no payload: it's just padding bytes to make the cell 514 bytes long.
The payload of MSM_ECHO cells:
- arbitrary bytes [max to fill up 514 byte cell]
The payload of MSM_BG cells:
- second [1 byte] - sent_bg_bytes [4 bytes] - recv_bg_bytes [4 bytes]
second is the number of seconds since the measurement began. MSM_BG cells are sent once per second from the relay to the FlashFlow coordinator. The first cell will have this set to 1, and each subsequent cell will increment it by one. sent_bg_bytes is the number of background traffic bytes sent in the last second (since the last MSM_BG cell). recv_bg_bytes is the same but for received bytes.
The payload of MSM_ERR cells:
- err_code [1 byte] - err_str [possibly zero-len null-terminated string]
The error code is one of:
[... XXX TODO ...] 255 -- OTHER
The error string is optional in all cases. It isn't present if the first byte of err_str is null, otherwise it is present. It ends at the first null byte or the end of the cell, whichever comes first.
3.1.4 Measurement Mode
The relay considers the measurement to have started the moment it receives the first MSM_ECHO cell from any measurer. At this point, the relay
- Starts a repeating 1s timer on which it will report the amount of background traffic to the coordinator over the coordinator's connection. - Enters "measurement mode" and limits the amount of background traffic it handles according to the torrc option/consensus parameter.
The relay decrypts and echos back all MSM_ECHO cells it receives on measurement connections until it has reported its amount of background traffic the same number of times as there are seconds in the measurement (e.g. 30 per-second reports for a 30 second measurement). After sending the last MSM_BG cell, the relay drops all buffered MSM_ECHO cells, closes all measurement connections, and exits measurement mode.
During the measurement the relay targets a ratio of background traffic to measurement traffic as specified by a consensus parameter/torrc option. For a given ratio r, if the relay has handled x cells of measurement traffic recently, Tor then limits itself to y = xr/(1-r) cells of non-measurement traffic this scheduling round. The target will enforce that a minimum of 10 Mbit/s of measurement traffic is recorded since the last background traffic scheduling round to ensure it always allows some minimum amount of background traffic.
3.2 FlashFlow Components
The FF coordinator and measurer code will reside in a FlashFlow repository separate from little-t tor.
There are three notable parameters for which a FF deployment must choose values. They are:
- The number of sockets, s, the measurers should open, in aggregate, with the target relay. We suggest s=160 based on the FF paper. - The bandwidth multiplier, m. Given an existing capacity estimate for a relay, z, the coordinator will instruct the measurers to, in aggregate, send m*z Mbit/s to the target relay. We recommend m=2.25. - The measurement duration, d. Based on the FF paper, we recommend d=30 seconds.
The rest of this section first discusses notable functions of the FlashFlow coordinator, then goes on to discuss FF measurer code that will require supporting tor code.
3.2.1 FlashFlow Coordinator
The coordinator is responsible for scheduling measurements, aggregating results, and producing v3bw files. It needs continuous access to new consensus files, which it can obtain by running an accompanying Tor process in client mode.
The coordinator has the following functions, which will be described in this section:
- result aggregation. - schedule measurements. - v3bw file generation.
3.2.1.1 Aggregating Results
Every second during a measurement, the measurers send the amount of verified measurement traffic they have received back from the relay. Additionally, the relay sends a MSM_BG cell each second to the coordinator with amount of non-measurement background traffic it is sending and receiving.
For each second's reports, the coordinator sums the measurer's reports. The coordinator takes the minimum of the relay's reported sent and received background traffic. If, when compared to the measurer's reports for this second, the relay's claimed background traffic is more than what's allowed by the background/measurement traffic ratio, then the coordinator further clamps the relay's report down. The coordinator adds this final adjusted amount of background traffic to the sum of the measurer's reports.
Once the coordinator has done the above for each second in the measurement (e.g. 30 times for a 30 second measurement), the coordinator takes the median of the 30 per-second throughputs and records it as the estimated capacity of the target relay.
3.2.1.2 Measurement Schedule
The short term implementation of measurement scheduling will be simpler than the long term one due to (1) there only being one FlashFlow deployment, and (2) there being very few relays that support being measured by FlashFlow. In fact the FF coordinator will maintain a list of the relays that have updated to support being measured and have opted in to being measured, and it will only measure them.
The coordinator divides time into a series of 24 hour periods, commonly referred to as days. Each period has measurement slots that are longer than a measurement lasts (30s), say 60s, to account for pre- and post-measurement work. Thus with 60s slots there's 1,440 slots in a day.
At the start of each day the coordinator considers the list of relays that have opted in to being measured. From this list of relays, it repeatedly takes the relay with the largest existing capacity estimate. It selects a random slot. If the slot has existing relays assigned to it, the coordinator makes sure there is enough additional measurer capacity to handle this relay. If so, it assigns this relay to this slot. If not, it keeps picking new random slots until one has sufficient additional measurer capacity.
Relays without existing capacity estimates are assumed to have the 75th percentile capacity of the current network.
If a relay is not online when it's scheduled to be measured, it doesn't get measured that day.
3.2.1.2.1 Example
Assume the FF deployment has 1 Gbit/s of measurer capacity. Assume the chosen multiplier m=2. Assume there are only 5 slots in a measurement period.
Consider a set of relays with the following existing capacity estimates and that have opted in to being measured by FlashFlow.
- 500 Mbit/s - 300 Mbit/s - 250 Mbit/s - 200 Mbit/s - 100 Mbit/s - 50 Mbit/s
The coordinator takes the largest relay, 500 Mbit/s, and picks a random slot for it. It picks slot 3. The coordinator takes the next largest, 300, and randomly picks slot 2. The slots are now:
0 | 1 | 2 | 3 | 4 -------|-------|-------|-------|------- | | 300 | 500 | | | | |
The coordinator takes the next largest, 250, and randomly picks slot 2. Slot 2 already has 600 Mbit/s of measurer capacity reserved (300*m); given just 1000 Mbit/s of total measurer capacity, there is just 400 Mbit/s of spare capacity while this relay requires 500 Mbit/s. There is not enough room in slot 2 for this relay. The coordinator picks a new random slot, 0.
0 | 1 | 2 | 3 | 4 -------|-------|-------|-------|------- 250 | | 300 | 500 | | | | |
The next largest is 200 and the coordinator randomly picks slot 2 again (wow!). As there is just enough spare capacity, the coordinator assigns this relay to slot 2.
0 | 1 | 2 | 3 | 4 -------|-------|-------|-------|------- 250 | | 300 | 500 | | | 200 | |
The coordinator randomly picks slot 4 for the last remaining relays, in that order.
0 | 1 | 2 | 3 | 4 -------|-------|-------|-------|------- 250 | | 300 | 500 | 100 | | 200 | | 50
3.2.1.3 Generating V3BW files
Every hour the FF coordinator produces a v3bw file in which it stores the latest capacity estimate for every relay it has measured in the last week. The coordinator will create this file on the host's local file system. Previously-generated v3bw files will not be deleted by the coordinator. A symbolic link at a static path will always point to the latest v3bw file.
$ ls -l v3bw -> v3bw.2020-03-01-05-00-00 v3bw.2020-03-01-00-00-00 v3bw.2020-03-01-01-00-00 v3bw.2020-03-01-02-00-00 v3bw.2020-03-01-03-00-00 v3bw.2020-03-01-04-00-00 v3bw.2020-03-01-05-00-00
3.2.2 FlashFlow Measurer
The measurers take commands from the coordinator, connect to target relays with many sockets, send them traffic, and verify the received traffic is the same as what was sent. Measurers need access to a lot of internal tor functionality. One strategy is to house as much logic as possible inside an compile-time-optional control port module that calls into other parts of tor. Alternatively FlashFlow could link against tor and call internal tor functions directly.
[XXX for now I'll assume that an optional little-t tor control port module housing a lot of this code is the best idea.]
Notable new things that internal tor code will need to do on the measurer (client) side:
1. Open many TLS+TCP connections to the same relay on purpose. 2. Verify echo cells.
3.2.2.1 Open many connections
FlashFlow prototypes needed to "hack in" a flag in the open-a-connection-with-this-relay function call chain that indicated whether or not we wanted to force a new connection to be created. Most of Tor doesn't care if it reuses an existing connection, but FF does want to create many different connections. The cleanest way to accomplish this will be investigated.
On the relay side, these measurer connections do not count towards DoS detection algorithms.
3.2.2.2 Verify echo cells
A parameter will exist to tell the measurers with what frequency they shall verify that cells echoed back to them match what was sent. This parameter does not need to exist outside of the FF deployment (e.g. it doesn't need to be a consensus parameter).
The parameter instructs the measurers to check 1 out of every N cells.
The measurer keeps a count of how many measurement cells it has sent. It also logically splits its output stream of cells into buckets of size N. At the start of each bucket (when num_sent % N == 0), the measurer chooses a random index in the bucket. Upon sending the cell at that index (num_sent % N == chosen_index), the measurer records the cell.
The measurer also counts cells that it receives. When it receives a cell at an index that was recorded, it verifies that the received cell matches the recorded sent cell. If they match, no special action is taken. If they don't match, the measurer indicates failure to the coordinator and target relay and closes all connections, ending the measurement.
3.2.2.2.1 Example
Consider bucket_size is 1000. For the moment ignore cell encryption.
We start at idx=0 and pick an idx in [0, 1000) to record, say 640. At idx=640 we record the cell. At idx=1000 we choose a new idx in [1000, 2000) to record, say 1236. At idx=1236 we record the cell. At idx=2000 we choose a new idx in [2000, 3000). Etc.
There's 2000+ cells in flight and the measurer has recorded two items:
- (640, contents_of_cellA) - (1236, contents_of_cellB)
Consider the receive side now. It counts the cells it receives. At receive idx=640, it checks the received cell matches the saved cell from before. At receive idx=1236, it again checks the received cell matches. Etc.
3.2.2.2.2 Motivation
A malicious relay may want to skip decryption of measurement cells to save CPU cycles and obtain a higher capacity estimate. More generally, it could generate fake measurement cells locally, ignore the measurement traffic it is receiving, and flood the measurer with more traffic that it (the measurer) is even sending.
The security of echo cell verification is discussed in section 3.3.1.
3.3 Security
In this section we discuss the security of various aspects of FlashFlow and the tor changes it requires.
3.3.1 Echo Cell Verification: Bucket Size
A smaller bucket size means more cells are checked and FF is more likely to detect a malicious target. It also means more bookkeeping overhead (CPU/RAM).
An adversary that knows bucket_size and cheats on one item out of every bucket_size items will have a 1/bucket_size chance of getting caught in the first bucket. This is the worst case adversary. While cheating on just a single item per bucket yields very little advantage, cheating on more items per bucket increases the likelihood the adversary gets caught. Thus only the worst case is considered here.
In general, the odds the adversary can successfully cheat in a single bucket are
(bucket_size-1)/bucket_size
Thus the odds the adversary can cheat in X consecutive buckets are
[(bucket_size-1)/bucket_size]^X
In our case, X will be highly varied: Slow relays won't see very many buckets, but fast relays will. The damage to the network a very slow relay can do by faking being only slightly faster is limited. Nonetheless, for now we motivate the selection of bucket_size with a slow relay:
- Assume a very slow relay of 1 Mbit/s capacity that will cheat 1 cell in each bucket. Assume a 30 second measurement. - The relay will handle 1*30 = 30 Mbit of traffic during the measurement, or 3.75 MB, or 3.75 million bytes. - Cells are 514 bytes. Approximately (e.g. ignoring TLS) 7300 cells will be sent/recv over the course of the measurement. - A bucket_size of 50 results in about 146 buckets over the course of the 30s measurement. - Therefore, the odds of the adversary cheating successfully as (49/50)^(146), or about 5.2%.
This sounds high, but a relay capable of double the bandwidth (2 Mbit/s) will have (49/50)^(2*146) or 0.2% odds of success, which is quite low.
Wanting a <1% chance that a 10 Mbit/s relay can successfully cheat results in a bucket size of approximately 125:
- 10*30 = 300 Mbit of traffic during 30s measurement. 37.5 million bytes. - 37,500,000 bytes / 514 bytes/cell = ~73,000 cells - bucket_size of 125 cells means 73,000 / 125 = 584 buckets - (124/125)^(584) = 0.918% chance of successfully cheating
Slower relays can cheat more easily but the amount of extra weight they can obtain is insignificant in absolute terms. Faster relays are essentially unable to cheat.
3.3.2 Weight Inflation
Target relays are an active part of the measurement process; they know they are getting measured. While a relay cannot fake the measurement traffic, it can trivially stop transferring client background traffic for the duration of the measurement yet claim it carried some. More generally, there is no verification of the claimed amount of background traffic during the measurement. The relay can claim whatever it wants, but it will not be trusted above the ratio the FlashFlow deployment is configured to know. This places an easy to understand, firm, and (if set as we suggest) low cap on how much a relay can inflate its measured capacity.
Consider a background/measurement ratio of 1/4, or 25%. Assume the relay in question has a hard limit on capacity (e.g. from its NIC) of 100 Mbit/s. The relay is supposed to use up to 25% of its capacity for background traffic and the remaining 75%+ capacity for measurement traffic. Instead the relay ceases carrying background traffic, uses all 100 Mbit/s of capacity to handle measurement traffic, and reports ~33 Mbit/s of background traffic (33/133 = ~25%). FlashFlow would trust this and consider the relay capable of 133 Mbit/s. (If the relay were to report more than ~33 Mbit/s, FlashFlow limits it to just ~33 Mbit/s.) With r=25%, FlashFlow only allows 1.33x weight inflation.
Prior work shows that Torflow allows weight inflation by a factor of 89x [0] or even 177x [1].
The ratio chosen is a trade-off between impact on background traffic and security: r=50% allows a relay to double its weight but won't impact client traffic for relays with steady state throughput below 50%, while r=10% allows a very low inflation factor but will cause throttling of client traffic at far more relays. We suggest r=25% (and thus 1/(1-0.25)=1.33x inflation) for a reasonable trade-off between performance and security.
It may be possible to catch relays performing this attack, especially if they literally drop all background traffic during the measurement: have the measurer (or some party on its behalf) create a regular stream through the relay and measure the throughput on the stream before/during/after the measurement. This can be explored longer term.
3.3.3 Incomplete Authentication
The short term FlashFlow implementation has the relay set two torrc options if they would like to allow themselves to be measured: a flag allowing measurement, and the list of coordinator TLS certificate that are allowed to start a measurement.
The relay drops MSM_PARAMS cells from coordinators it does not trust, and immediately closes the connection after that. A FF coordinator cannot convince a relay to enter measurement mode unless the relay trusts its TLS certificate.
A trusted coordinator specifies in the MSM_PARAMS cell the IP addresses of the measurers the relay shall expect to connect to it shortly. The target adds the measurer IP addresses to a whitelist in the DoS connection limit system, exempting them from any configured connection limit. If a measurer is behind a NAT, an adversary behind the same NAT can DoS the relay's available sockets until the end of the measurement. The adversary could also pretend to be the measurer. Such an adversary could induce measurement failures and inaccuracies. (Note: the whitelist is cleared after the measurement is over.)
4. FlashFlow measurement system: Medium term
The medium term deployment stage begins after FlashFlow has been implemented and relays are starting to update to a version of Tor that supports it.
We plan to host a FlashFlow deployment consisting of a FF coordinator and a single FF measurer on a single 1 Gbit/s machine. Data produced by this deployment will be made available (semi?) publicly, including both v3bw files and intermediate results.
Any development changes needed during this time would go through separate proposals.
5. FlashFlow measurement system: Long term
In the long term, finishing-touch development work will be done, including adding better authentication and measurement scheduling, and experiments will be run to determine the best way to integrate FlashFlow into the Tor ecosystem.
Any development changes needed during this time would go through separate proposals.
5.1 Authentication to Target Relay
Short term deployment already had FlashFlow coordinators using TLS certificates when connecting to relays, but in the long term, directory authorities will vote on the consensus parameter for which coordinators should be allowed to perform measurements. The voting is done in the same way they currently vote on recommended tor versions.
FlashFlow measurers will be updated to use TLS certificates when connecting to relays too. FlashFlow coordinators will update the contents of MSM_PARAMS cells to contain measurer TLS certificates instead of IP addresses, and relays will update to expect this change.
5.2 Measurement Scheduling
Short term deployment only has one FF deployment running. Long term this may no longer be the case because, for example, more than one directory authority decides to adopt it and they each want to run their own deployment. FF deployments will need to coordinate between themselves to not measure the same relay at the same time, and to handle new relays as they join during the middle of a measurement period (during the day).
The following is quoted from Section 4.3 of the FlashFlow paper.
To measure all relays in the network, the BWAuths periodically determine the measurement schedule. The schedule determines when and by whom a relay should be measured. We assume that the BWAuths have sufficiently synchronized clocks to facilitate coordinating their schedules. A measurement schedule is created for each measurement period, the length p of which determines how often a relay is measured. We use a measurement period of p = 24 hours.
To help avoid active denial-of-service attacks on targeted relays, the measurement schedule is randomized and known only to the BWAuths. Before the next measurement period starts, the BWAuths collectively generate a random seed (e.g. using Tor’s secure-randomness protocol). Each BWAuth can then locally determine the shared schedule using pseudorandom bits extracted from that seed. The algorithm to create the schedule considers each measurement period to be divided into a sequence of t-second measurement slots. For each old relay, slots for each BWAuth to measure it are selected uniformly at random without replacement from all slots in the period that have sufficient unallocated measurement capacity to accommodate the measurement. When a new relay appears, it is measured separately by each BWAuth in the first slots with sufficient unallocated capacity. Note that this design ensures that old relays will continue to be measured, with new relays given secondary priority in the order they arrive.
5.3 Experiments
[XXX todo]
5.4 Other Changes/Investigations/Ideas
- How can FlashFlow data be used in a way that doesn't lead to poor load balancing given the following items that lead to non-uniform client behavior: - Guards that high-traffic HSs choose (for 3 months at a time) - Guard vs middle flag allocation issues - New Guard nodes (Guardfraction) - Exit policies other than default/all - Directory activity - Total onion service activity - Super long-lived circuits - Add a cell that the target relay sends to the coordinator indicating its CPU and memory usage, whether it has a shortage of sockets, how much bandwidth load it has been experiencing lately, etc. Use this information to lower a relays weight, never increase. - If FlashFlow and sbws work together (as opposed to FlashFlow replacing sbws), consider logic for how much sbws can increase/decrease FF results - Coordination of multiple FlashFlow deployments: scheduling of measurements, seeding schedule with shared random value. - Other background/measurement traffic ratios. Dynamic? (known slow relay => more allowed bg traffic?) - Catching relays inflating their measured capacity by dropping background traffic. - What to do about co-located relays. Can they be detected reliably? Should we just add a torrc option a la MyFamily for co-located relays? - What is the explanation for dennis.jackson's scary graphs in this [2] ticket? Was it because of the speed test? Why? Will FlashFlow produce the same behavior?
6. Citations
[0] F. Thill. Hidden Service Tracking Detection and Bandwidth Cheating in Tor Anonymity Network. Master’s thesis, Univ. Luxembourg, 2014. [1] A. Johnson, R. Jansen, N. Hopper, A. Segal, and P. Syverson. PeerFlow: Secure Load Balancing in Tor. Proceedings on Privacy Enhancing Technologies (PoPETs), 2017(2), April 2017. [2] Mike Perry: Graph onionperf and consensus information from Rob's experiments https://trac.torproject.org/projects/tor/ticket/33076
Hi,
Thanks for this proposal!
I'm looking forward to more secure bandwidth measurements on the Tor network.
Overall, this proposal looks good.
But I'm particularly concerned about any communication between bandwidth coordinators. Our general principle is that directory authorities should be independent, and we should minimise their communication and dependencies. This principle also extends to bandwidth authorities.
For FlashFlow, here are some specific reasons to avoid bandwidth coordinator communication: * it adds complexity to the protocol * it adds an additional failure mode: failure of coordinator communication * if the communication is required, this failure mode becomes a denial of service vulnerability * if the communication is optional, the failure could activate a less-tested fallback mode, and change coordinator behaviour * it adds a class of additional bugs: coordinator miscommunication, including race conditions * it adds a class of additional security vulnerabilities, via coordinator communication * it adds additional coordinator configuration, which must stay synchronised. There's two ways to sync config: * in the consensus: the coordinator IP addresses are public, or * privately: the configs easily get out of sync
There's also some information missing from the proposal, I'll point it out as part of this review.
On 24 Apr 2020, at 04:48, Matt Traudt pastly@torproject.org wrote:
Filename: xxx-flashflow.txt Title: FlashFlow: A Secure Speed Test for Tor (Parent Proposal) Author: Matthew Traudt, Aaron Johnson, Rob Jansen, Mike Perry Created: 23 April 2020 Status: Draft
...
3.1.2 New Cell Types
FlashFlow will introduce a new cell command MEASURE.
The payload of each MEASURE cell consists of:
Measure command [1 byte] Length [2 bytes] Data [Length-3 bytes]
The measure commands are:
0 -- MSM_PARAMS [forward] 1 -- MSM_PARAMS_OK [backward] 2 -- MSM_ECHO [forward and backward] 3 -- MSM_BG [backward] 4 -- MSM_ERR [forward and backward]
Readability note:
"MSM" is a standard abbreviation for "mainstream media".
A standard abbreviation for measurement is "MEAS": https://www.abbreviations.com/abbreviation/MEASurement
...
3.1.3 Pre-Measurement Handshaking/Starting a Measurement
The coordinator connects to the target relay and sends it a MSM_PARAMS cell.
How much of the tor link protocol does the coordinator implement?
Currently, tor requires the following cells: * VERSIONS * NETINFO
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n542
If the target is unwilling to be measured at this time or if the coordinator didn't use a TLS certificate that the target trusts, it responds with an error cell and closes the connection. Otherwise it checks that the parameters of the measurement are acceptable (e.g. the version is acceptable, the duration isn't too long, etc.). If the target is happy, it sends a MSM_PARAMS_OK, otherwise it sends a MSM_ERR and closes the connection.
Upon learning the IP addresses of the measurers from the coordinator in the MSM_PARAMS cell, the target whitelists their IPs in its DoS detection subsystem until the measurement ends (successfully or otherwise), at which point the whitelist is cleared.
Upon receiving a MSM_PARAMS_OK from the target, the coordinator will instruct the measurers to open their TCP connections with the target. If the coordinator or any measurer receives a MSM_ERR, it reports the error to the coordinator and considers the measurement a failure. It is also a failure if any measurer is unable to open at least half of its TCP connections with the target.
The payload of MSM_PARAMS cells [XXX more may need to be added]:
- version [1 byte]
What are the minimum and maximum valid values for this field? 0..255 ? 1..255 ?
Tor uses a standard ext-type-length-value format for new cell fields, rather than parsing them based on a version field.
It may still be useful to have a version field for information purposes. (And to workaround bugs in older versions.) Normally, we'd use Tor Relay protocol versions, but the coordinators and measurers aren't in the consensus.
Here's an example of the ext-type-length-value format:
N_EXTENSIONS [1 byte] N_EXTENSIONS times: EXT_FIELD_TYPE [1 byte] EXT_FIELD_LEN [1 byte] EXT_FIELD [EXT_FIELD_LEN bytes]
https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt#n1518
You'd want an extension in each measurer info, and also one at the end of the cell, for any new general fields.
Tor already knows how to parse these fields, because they are used for v3 onion services.
- msm_duration [1 byte]
What are the minimum and maximum valid values for this field? 1..255 ?
Do we want to limit measurements to 4 minutes at a protocol level?
In general, protocols should make invalid states impossible to represent. But do we want a 4 minute hard limit here?
- num_measurers [1 byte]
What are the minimum and maximum valid values for this field? 1..255 ? 1..17 ?
If you're using a standard 509 byte payload, then the practical limits are: * 84 in the original format * 50 with link specifiers, extensions, and IPv4 addresses * 17 with link specifiers, extensions, IPv4, and IPv6 addresses
- measurer_info [num_measurers times]
- ipv4_addr [4 bytes]
What about IPv6 ? * 30% of tor relays support IPv6 * proposal 311 introduces IPv6 connections between relays, and we're implementing it right now * IPv4 and IPv6 routing can be different, so their bandwidths can also be different
Instead of a hard-coded IPv4 field, you could use the IPv4 and IPv6 link specifiers, which tor already knows how to parse:
NSPEC (Number of link specifiers) [1 byte] NSPEC times: LSTYPE (Link specifier type) [1 byte] LSLEN (Link specifier length) [1 byte] LSPEC (Link specifier) [LSLEN bytes]
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n1001
As an example, the v3 onion service spec re-uses link specifiers here: https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt#n227
- num_conns [2 bytes]
This field should probably go before the link specifiers.
What are the minimum and maximum valid values for this field? 1..65535 ?
Do we really need to allow more than 255 connections? Most relays can only handle around 10,000 - 20,000 connections.
version dictates how this MSM_PARAMS cell shall be parsed. msm_duration is the duration, in seconds, that the actual measurement will last. num_measurers is how many measurer_info structs follow. For each measurer, the ipv4_addr it will use when connecting to the target is provided, as is num_conns, the number of TCP connections that measurer will open with the target. Future versions of FlashFlow and MSM_PARAMS will use TLS certificates instead of IP addresses.
FlashFlow won't be able to measure relays behind a NAT, if it authenticates using IP addresses. Relays see the IP address of the NAT device, rather than the IP address of the remote measurer.
For a similar reason, the DOS defences reduce the number of client connections to a relay behind a NAT. So we can safely ignore those relays for the moment.
But it would still be useful to talk about the IP address and NAT issue in this proposal.
MSM_PARAMS_OK has no payload: it's just padding bytes to make the cell 514 bytes long.
Should we add ext-type-length-value fields to this cell?
For example, the MSM_PARAMS_OK cell could be used to communicate the relay's recent CPU load and connection load.
The payload of MSM_ECHO cells:
- arbitrary bytes [max to fill up 514 byte cell]
Note:
Link protocol 3 is still supported, so cells can be 512 or 514 bytes: https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n2094
Let's say "PAYLOAD_LEN payload (509 bytes)" instead.
If FlashFlow requires link protocol version 4, let's explain why in this proposal.
The payload of MSM_BG cells:
- second [1 byte]
What are the minimum and maximum valid values for this field? 1..msm_duration ?
- sent_bg_bytes [4 bytes]
- recv_bg_bytes [4 bytes]
What are the minimum and maximum valid values for these fields?
Should we add ext-type-length-value fields to this cell?
For example, the MSM_BG cell could be used to communicate the relay's current CPU load and connection load.
second is the number of seconds since the measurement began. MSM_BG cells are sent once per second from the relay to the FlashFlow coordinator. The first cell will have this set to 1, and each subsequent cell will increment it by one. sent_bg_bytes is the number of background traffic bytes sent in the last second (since the last MSM_BG cell). recv_bg_bytes is the same but for received bytes.
The payload of MSM_ERR cells:
- err_code [1 byte]
- err_str [possibly zero-len null-terminated string]
We don't have strings in any other tor protocol cells.
If you need extensible error information, can I suggest using ext-type-length-value fields:
N_EXTENSIONS [1 byte] N_EXTENSIONS times: EXT_FIELD_TYPE [1 byte] EXT_FIELD_LEN [1 byte] EXT_FIELD [EXT_FIELD_LEN bytes]
https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt#n1518
If strings are necessary, please specify a character encoding (ASCII or UTF-8), and an allowed set of characters.
If we don't whitelist characters, we risk logging terminal escape sequences, or other arbitrary data.
The error code is one of:
[... XXX TODO ...] 255 -- OTHER
The error string is optional in all cases. It isn't present if the first byte of err_str is null, otherwise it is present. It ends at the first null byte or the end of the cell, whichever comes first.
3.1.4 Measurement Mode
The relay considers the measurement to have started the moment it receives the first MSM_ECHO cell from any measurer.
What happens if the relay never receives a MSM_ECHO cell?
Do MSM_ECHO cells from invalid measurers count?
How much of the tor link protocol does the measurer implement? Currently, tor requires the following cells: * VERSIONS * NETINFO https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n542
At this point, the relay
- Starts a repeating 1s timer on which it will report the amount of background traffic to the coordinator over the coordinator's connection.
- Enters "measurement mode" and limits the amount of background traffic it handles according to the torrc option/consensus parameter.
The relay decrypts and echos back all MSM_ECHO cells it receives on measurement connections
Are MSM_ECHO cells relay cells? How much of the relay protocol does the measurer implement?
The references to decrypting cells suggest that MSM_ECHO cells are relay (circuit-level) cells. But earlier sections suggest that they are link cells.
If they are link cells, what key material is used for decryption? How do the measurer and relay agree on this key material?
If they are relay cells, do they use the ntor handshake? https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n1132
until it has reported its amount of background traffic the same number of times as there are seconds in the measurement (e.g. 30 per-second reports for a 30 second measurement). After sending the last MSM_BG cell, the relay drops all buffered MSM_ECHO cells, closes all measurement connections, and exits measurement mode.
To be more precise here, can we say:
"the relay drops all inbound and outbound MSM_ECHO cells from measurers associated with the completed measurement"
Can we avoid assuming that there is always only one measurement happening at one time?
During the measurement the relay targets a ratio of background traffic to measurement traffic as specified by a consensus parameter/torrc option. For a given ratio r, if the relay has handled x cells of measurement traffic recently, Tor then limits itself to y = xr/(1-r) cells of non-measurement traffic this scheduling round. The target will enforce that a minimum of 10 Mbit/s of measurement traffic is recorded since the last background traffic scheduling round to ensure it always allows some minimum amount of background traffic.
Do you mean "a maximum of 10 Mbit/s of measurement traffic" ?
3.2 FlashFlow Components
The FF coordinator and measurer code will reside in a FlashFlow repository separate from little-t tor.
There are three notable parameters for which a FF deployment must choose values. They are:
- The number of sockets, s, the measurers should open, in aggregate, with the target relay. We suggest s=160 based on the FF paper.
- The bandwidth multiplier, m. Given an existing capacity estimate for a relay, z, the coordinator will instruct the measurers to, in aggregate, send m*z Mbit/s to the target relay. We recommend m=2.25.
- The measurement duration, d. Based on the FF paper, we recommend d=30 seconds.
Are these parameters per-coordinator, or network-wide?
How are they kept in sync between the coordinator and measurers?
The rest of this section first discusses notable functions of the FlashFlow coordinator, then goes on to discuss FF measurer code that will require supporting tor code.
3.2.1 FlashFlow Coordinator
The coordinator is responsible for scheduling measurements, aggregating results, and producing v3bw files. It needs continuous access to new consensus files, which it can obtain by running an accompanying Tor process in client mode.
Recent tor versions go dormant when they haven't built circuits for a while. There are options that prevent dormancy, but they are only designed for interactive applications.
Is the FlashFlow coordinator going to use tor to implement the tor link protocol?
If the coordinator uses tor, then it can use the same tor client instance that's downloading its consensuses.
Otherwise, you might just be better using a small stem script, and a download timer.
If you use a timer, you can download each new consensus, shortly after it is created. (Clients often have consensuses that are 1-2 hours old, unless specifically configured to fetch from directory authorities. Even then, they can take up to an hour to download a new consensus.)
The coordinator has the following functions, which will be described in this section:
- result aggregation.
- schedule measurements.
- v3bw file generation.
3.2.1.1 Aggregating Results
Every second during a measurement, the measurers send the amount of verified measurement traffic they have received back from the relay. Additionally, the relay sends a MSM_BG cell each second to the coordinator with amount of non-measurement background traffic it is sending and receiving.
What happens if some of these cells is dropped by the relay, due to a traffic overload?
If these cells are exempt from the [Relay]Bandwidth{Rate,Burst} options, let's say that in this proposal.
What happens if some of these cells are delayed due to the MSM_ECHO cells?
How long a delay does the coordinator tolerate?
For each second's reports, the coordinator sums the measurer's reports. The coordinator takes the minimum of the relay's reported sent and received background traffic. If, when compared to the measurer's reports for this second, the relay's claimed background traffic is more than what's allowed by the background/measurement traffic ratio, then the coordinator further clamps the relay's report down. The coordinator adds this final adjusted amount of background traffic to the sum of the measurer's reports.
Once the coordinator has done the above for each second in the measurement (e.g. 30 times for a 30 second measurement), the coordinator takes the median of the 30 per-second throughputs and records it as the estimated capacity of the target relay.
3.2.1.2 Measurement Schedule
The short term implementation of measurement scheduling will be simpler than the long term one due to (1) there only being one FlashFlow deployment, and (2) there being very few relays that support being measured by FlashFlow. In fact the FF coordinator will maintain a list of the relays that have updated to support being measured and have opted in to being measured, and it will only measure them.
The coordinator divides time into a series of 24 hour periods, commonly referred to as days. Each period has measurement slots that are longer than a measurement lasts (30s), say 60s, to account for pre- and post-measurement work. Thus with 60s slots there's 1,440 slots in a day.
At the start of each day the coordinator considers the list of relays that have opted in to being measured. From this list of relays, it repeatedly takes the relay with the largest existing capacity estimate. It selects a random slot. If the slot has existing relays assigned to it, the coordinator makes sure there is enough additional measurer capacity to handle this relay. If so, it assigns this relay to this slot. If not, it keeps picking new random slots until one has sufficient additional measurer capacity.
What if the coordinator doesn't have enough capacity to handle all the relays on the network? (That is, what if all the slots are full?)
What if the capacity is limited at some other point on the internet?
For example: * an intermediate transit provider between the measurer and all the chosen relays * the chosen relays are all on the same local network
Relays without existing capacity estimates are assumed to have the 75th percentile capacity of the current network.
If a relay is not online when it's scheduled to be measured, it doesn't get measured that day.
Online in the consensus, or listening via its ORPort? (There's a delay of up to 3 hours here, whenever the relay goes up or down.)
What bandwidth weight does an offline relay get? sbws has had issues because it drops offline relays.
3.2.1.2.1 Example
...
3.2.1.3 Generating V3BW files
Every hour the FF coordinator produces a v3bw file in which it stores the latest capacity estimate for every relay it has measured in the last week. The coordinator will create this file on the host's local file system. Previously-generated v3bw files will not be deleted by the coordinator.
Seems risky, we've seen Torflow fail in the past, because it filled up the disk with bandwidth files.
What's the required disk capacity for a few years of bandwidth files?
A symbolic link at a static path will always point to the latest v3bw file.
$ ls -l v3bw -> v3bw.2020-03-01-05-00-00 v3bw.2020-03-01-00-00-00 v3bw.2020-03-01-01-00-00 v3bw.2020-03-01-02-00-00 v3bw.2020-03-01-03-00-00 v3bw.2020-03-01-04-00-00 v3bw.2020-03-01-05-00-00
You might want to reference the v3bw spec here: https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt
3.2.2 FlashFlow Measurer
The measurers take commands from the coordinator
The command protocol is not specified in this proposal.
For example, does the coordinator send the IPv4 and IPv6 addresses of the relay to the measurers?
Which deployment parameters are sent via the protocol, and which are hard-coded in configurations?
connect to target relays with many sockets, send them traffic, and verify the received traffic is the same as what was sent. Measurers need access to a lot of internal tor functionality. One strategy is to house as much logic as possible inside an compile-time-optional control port module that calls into other parts of tor. Alternatively FlashFlow could link against tor and call internal tor functions directly.
[XXX for now I'll assume that an optional little-t tor control port module housing a lot of this code is the best idea.]
Yes, please don't depend on internal, unspecified interfaces.
Notable new things that internal tor code will need to do on the measurer (client) side:
- Open many TLS+TCP connections to the same relay on purpose.
- Verify echo cells.
3.2.2.1 Open many connections
...
3.3 Security
...
- FlashFlow measurement system: Medium term
The medium term deployment stage begins after FlashFlow has been implemented and relays are starting to update to a version of Tor that supports it.
We avoid using tor versions to detect relay features. Instead, we use subprotocol versions:
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n2041
In the first tor release that supports the medium-term FlashFlow, let's reserve a "Link" protocol version:
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n2094
If any of the FlashFlow cells are relay cells, let's also reserve a "Relay" protocol version:
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n2122
(We don't want to pick the exact version numbers yet. Let's wait until the actual tor release.)
We plan to host a FlashFlow deployment consisting of a FF coordinator and a single FF measurer on a single 1 Gbit/s machine. Data produced by this deployment will be made available (semi?) publicly, including both v3bw files and intermediate results.
All directory authorities publish v3bw files at a standard URL, so if you use these files in voting, they will be public.
Any development changes needed during this time would go through separate proposals.
- FlashFlow measurement system: Long term
In the long term, finishing-touch development work will be done, including adding better authentication and measurement scheduling, and experiments will be run to determine the best way to integrate FlashFlow into the Tor ecosystem.
Any development changes needed during this time would go through separate proposals.
5.1 Authentication to Target Relay
Short term deployment already had FlashFlow coordinators using TLS certificates when connecting to relays, but in the long term, directory authorities will vote on the consensus parameter for which coordinators should be allowed to perform measurements. The voting is done in the same way they currently vote on recommended tor versions.
FlashFlow measurers will be updated to use TLS certificates when connecting to relays too. FlashFlow coordinators will update the contents of MSM_PARAMS cells to contain measurer TLS certificates instead of IP addresses, and relays will update to expect this change.
You'll want another new "Link" protocol version for this feature. And another type of link specifier.
5.2 Measurement Scheduling
Short term deployment only has one FF deployment running. Long term this may no longer be the case because, for example, more than one directory authority decides to adopt it and they each want to run their own deployment. FF deployments will need to coordinate between themselves to not measure the same relay at the same time, and to handle new relays as they join during the middle of a measurement period (during the day).
The following is quoted from Section 4.3 of the FlashFlow paper.
To measure all relays in the network, the BWAuths periodically determine the measurement schedule. The schedule determines when and by whom a relay should be measured. We assume that the BWAuths have sufficiently synchronized clocks to facilitate coordinating their schedules. A measurement schedule is created for each measurement period, the length p of which determines how often a relay is measured. We use a measurement period of p = 24 hours.
To help avoid active denial-of-service attacks on targeted relays, the measurement schedule is randomized and known only to the BWAuths. Before the next measurement period starts, the BWAuths collectively generate a random seed (e.g. using Tor’s secure-randomness protocol). Each BWAuth can then locally determine the shared schedule using pseudorandom bits extracted from that seed.
As noted above, communication between BWAuths reduces their independence, and adds additional risk and complexity in the protocol.
Once-Off Shared Secret Exchange
Here's an alternative protocol, that does not require an additional shared random implementation:
1. The BWAuths manually exchange a shared secret key (SHARED_SECRET) out-of-band 2. Every day, the BWAuths independently derive a shared secret seed for the measurement protocol, using a hash function (H), and tor's public shared random value (SRV):
DAILY_SECRET = H(SHARED_SECRET | SRV)
We might also want to use the period number here, like the SRV and onion service hash ring specs.
The shared secret key should be rotated: * each time a new BWAuth is added or removed from the network, and * 1 year after the last rotation.
The key rotation can be performed over a few days, because: * each BWAuth has one of two keys: the new key, or the old key, * overlaps should be rare in practice, * when there is an overlap, at most two BWAuths will overlap, one from each key, * overlaps have a low impact for most relays.
The algorithm to create the schedule considers each measurement period to be divided into a sequence of t-second measurement slots. For each old relay, slots for each BWAuth to measure it are selected uniformly at random without replacement from all slots in the period that have sufficient unallocated measurement capacity to accommodate the measurement. When a new relay appears, it is measured separately by each BWAuth in the first slots with sufficient unallocated capacity. Note that this design ensures that old relays will continue to be measured, with new relays given secondary priority in the order they arrive.
It's unclear whether this protocol is interactive or not.
Here's a protocol that is explicitly non-interactive:
1. Measurers are assigned a daily order, based on each coordinator's certificate hash, and the current DAILY_SECRET. 2. For each coordinator, in the daily order: a. Relays in a chosen consensus choose a slot at random, based on the DAILY_SECRET, the relay key, and the iteration number c. If another coordinator is already measuring that relay in that slot, increase the iteration number, and repeat from a. b. If the slot is full for the current coordinator, increase the iteration number, and repeat from a. d. Otherwise, allocate that relay to that slot, for that coordinator.
We might also want to use other shared data here, like the consensus timestamp.
To make sure all the coordinators have the same consensus, we should keep a copy of the most recent shared consensus. Here's how we can select a shared consensus: * if we're using a scheduled fetch, a consensus from at least 1 hour ago (usually 2300 UTC), * if we're using a tor client to fetch, a consensus from at least 3 hours ago (usually 2100 UTC).
If there isn't a consensus for that time, we should keep the most recent consensus before that time.
It doesn't actually matter if the consensus is a little out of sync, most relays will have the same fingerprints, and end up in the same slots.
5.3 Experiments
[XXX todo]
5.4 Other Changes/Investigations/Ideas
...
- Citations
[0] F. Thill. Hidden Service Tracking Detection and Bandwidth Cheating in Tor Anonymity Network. Master’s thesis, Univ. Luxembourg, 2014. [1] A. Johnson, R. Jansen, N. Hopper, A. Segal, and P. Syverson. PeerFlow: Secure Load Balancing in Tor. Proceedings on Privacy Enhancing Technologies (PoPETs), 2017(2), April 2017. [2] Mike Perry: Graph onionperf and consensus information from Rob's experiments https://trac.torproject.org/projects/tor/ticket/33076
T
Thanks for the review, Teor. We really appreciate it.
Comments/responses inline with some trimming at the beginning (I gave up and just left everything in after the first couple of responses).
On 4/23/20 21:05, teor wrote:
...
But I'm particularly concerned about any communication between bandwidth coordinators. Our general principle is that directory authorities should be independent, and we should minimise their communication and dependencies. This principle also extends to bandwidth authorities.
For FlashFlow, here are some specific reasons to avoid bandwidth coordinator communication:
- it adds complexity to the protocol
- it adds an additional failure mode: failure of coordinator communication
- if the communication is required, this failure mode becomes a denial of service vulnerability
- if the communication is optional, the failure could activate a less-tested fallback mode, and change coordinator behaviour
- it adds a class of additional bugs: coordinator miscommunication, including race conditions
- it adds a class of additional security vulnerabilities, via coordinator communication
- it adds additional coordinator configuration, which must stay synchronised. There's two ways to sync config:
- in the consensus: the coordinator IP addresses are public, or
- privately: the configs easily get out of sync
We do not envision inter-coordinator communication other than consensus parameter voting and rare out-of-band human-to-human "hey we should change X parameter because [...]".
Each coordinator can calculate every coordinator's measurement schedule for the entire measurement period independently given only inputs present in the consensus (e.g. the shared random value). I believe the under-specified section *5.2 Measurement Scheduling* is the primary source of your concern here.
On 24 Apr 2020, at 04:48, Matt Traudt pastly@torproject.org wrote:
...
The measure commands are:
0 -- MSM_PARAMS [forward] 1 -- MSM_PARAMS_OK [backward] 2 -- MSM_ECHO [forward and backward] 3 -- MSM_BG [backward] 4 -- MSM_ERR [forward and backward]
Readability note:
"MSM" is a standard abbreviation for "mainstream media".
A standard abbreviation for measurement is "MEAS": https://www.abbreviations.com/abbreviation/MEASurement
Fair, why-didn't-I-think-of-that point.
...
3.1.3 Pre-Measurement Handshaking/Starting a Measurement
The coordinator connects to the target relay and sends it a MSM_PARAMS cell.
How much of the tor link protocol does the coordinator implement?
Currently, tor requires the following cells:
- VERSIONS
- NETINFO
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n542
Coordinators use a Tor client (with requisite new features) to communicate with the relay. So, none of the link protocol.
If the target is unwilling to be measured at this time or if the coordinator didn't use a TLS certificate that the target trusts, it responds with an error cell and closes the connection. Otherwise it checks that the parameters of the measurement are acceptable (e.g. the version is acceptable, the duration isn't too long, etc.). If the target is happy, it sends a MSM_PARAMS_OK, otherwise it sends a MSM_ERR and closes the connection.
Upon learning the IP addresses of the measurers from the coordinator in the MSM_PARAMS cell, the target whitelists their IPs in its DoS detection subsystem until the measurement ends (successfully or otherwise), at which point the whitelist is cleared.
Upon receiving a MSM_PARAMS_OK from the target, the coordinator will instruct the measurers to open their TCP connections with the target. If the coordinator or any measurer receives a MSM_ERR, it reports the error to the coordinator and considers the measurement a failure. It is also a failure if any measurer is unable to open at least half of its TCP connections with the target.
The payload of MSM_PARAMS cells [XXX more may need to be added]:
- version [1 byte]
What are the minimum and maximum valid values for this field? 0..255 ? 1..255 ?
Tor uses a standard ext-type-length-value format for new cell fields, rather than parsing them based on a version field.
You've provided lots of feedback consisting of
- min/max value questions - suggestion of ext-type-length-value format
And it's all appreciated and valuable. We're not as up to speed on the latest Tor coding conventions.
It may still be useful to have a version field for information purposes. (And to workaround bugs in older versions.) Normally, we'd use Tor Relay protocol versions, but the coordinators and measurers aren't in the consensus.
Here's an example of the ext-type-length-value format:
N_EXTENSIONS [1 byte] N_EXTENSIONS times: EXT_FIELD_TYPE [1 byte] EXT_FIELD_LEN [1 byte] EXT_FIELD [EXT_FIELD_LEN bytes]
https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt#n1518
You'd want an extension in each measurer info, and also one at the end of the cell, for any new general fields.
Tor already knows how to parse these fields, because they are used for v3 onion services.
- msm_duration [1 byte]
What are the minimum and maximum valid values for this field? 1..255 ?
Do we want to limit measurements to 4 minutes at a protocol level?
In general, protocols should make invalid states impossible to represent. But do we want a 4 minute hard limit here?
This document suggests a measurement duration of 30 seconds. We see no reason to ever go above 1 minute. If there's a byte to spare, then sure let's make this a uint16.
- num_measurers [1 byte]
What are the minimum and maximum valid values for this field? 1..255 ? 1..17 ?
If you're using a standard 509 byte payload, then the practical limits are:
- 84 in the original format
- 50 with link specifiers, extensions, and IPv4 addresses
- 17 with link specifiers, extensions, IPv4, and IPv6 addresses
- measurer_info [num_measurers times]
- ipv4_addr [4 bytes]
What about IPv6 ?
- 30% of tor relays support IPv6
- proposal 311 introduces IPv6 connections between relays, and we're implementing it right now
- IPv4 and IPv6 routing can be different, so their bandwidths can also be different
Instead of a hard-coded IPv4 field, you could use the IPv4 and IPv6 link specifiers, which tor already knows how to parse:
NSPEC (Number of link specifiers) [1 byte] NSPEC times: LSTYPE (Link specifier type) [1 byte] LSLEN (Link specifier length) [1 byte] LSPEC (Link specifier) [LSLEN bytes]
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n1001
As an example, the v3 onion service spec re-uses link specifiers here: https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt#n227
This is useful, thanks. Yes this is probably a better idea. A maximum of 17 measurers should be enough. It's hard to imagine a bwauth would have more than even 10 hosts they'd want to act as measurers.
- num_conns [2 bytes]
This field should probably go before the link specifiers.
What are the minimum and maximum valid values for this field? 1..65535 ?
Do we really need to allow more than 255 connections? Most relays can only handle around 10,000 - 20,000 connections.
160 connections was determined "to be right" (major glossing over details here) in the experiments we did for the paper. As part of that we also tried 320, which is bigger than 255. While I don't imagine we'll ever want more than 255, two bytes leaves us room.
version dictates how this MSM_PARAMS cell shall be parsed. msm_duration is the duration, in seconds, that the actual measurement will last. num_measurers is how many measurer_info structs follow. For each measurer, the ipv4_addr it will use when connecting to the target is provided, as is num_conns, the number of TCP connections that measurer will open with the target. Future versions of FlashFlow and MSM_PARAMS will use TLS certificates instead of IP addresses.
FlashFlow won't be able to measure relays behind a NAT, if it authenticates using IP addresses. Relays see the IP address of the NAT device, rather than the IP address of the remote measurer.
For a similar reason, the DOS defences reduce the number of client connections to a relay behind a NAT. So we can safely ignore those relays for the moment.
But it would still be useful to talk about the IP address and NAT issue in this proposal.
This is not an issue we had considered. For the short term deployment maybe we can just say not being NAT-ed is a prerequisite for opting in to measurement.
MSM_PARAMS_OK has no payload: it's just padding bytes to make the cell 514 bytes long.
Should we add ext-type-length-value fields to this cell?
For example, the MSM_PARAMS_OK cell could be used to communicate the relay's recent CPU load and connection load.
The payload of MSM_ECHO cells:
- arbitrary bytes [max to fill up 514 byte cell]
Note:
Link protocol 3 is still supported, so cells can be 512 or 514 bytes: https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n2094
Let's say "PAYLOAD_LEN payload (509 bytes)" instead.
If FlashFlow requires link protocol version 4, let's explain why in this proposal.
I don't believe FlashFlow requires link protocol 4. Yes the above should be re-worded.
The payload of MSM_BG cells:
- second [1 byte]
What are the minimum and maximum valid values for this field? 1..msm_duration ?
- sent_bg_bytes [4 bytes]
- recv_bg_bytes [4 bytes]
What are the minimum and maximum valid values for these fields?
Should we add ext-type-length-value fields to this cell?
For example, the MSM_BG cell could be used to communicate the relay's current CPU load and connection load.
Indeed we picture MSM_BG cells being used in the longer term to communicate such information. ext-type-length-value fields are probably called for.
second is the number of seconds since the measurement began. MSM_BG cells are sent once per second from the relay to the FlashFlow coordinator. The first cell will have this set to 1, and each subsequent cell will increment it by one. sent_bg_bytes is the number of background traffic bytes sent in the last second (since the last MSM_BG cell). recv_bg_bytes is the same but for received bytes.
The payload of MSM_ERR cells:
- err_code [1 byte]
- err_str [possibly zero-len null-terminated string]
We don't have strings in any other tor protocol cells.
If you need extensible error information, can I suggest using ext-type-length-value fields:
N_EXTENSIONS [1 byte] N_EXTENSIONS times: EXT_FIELD_TYPE [1 byte] EXT_FIELD_LEN [1 byte] EXT_FIELD [EXT_FIELD_LEN bytes]
https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt#n1518
If strings are necessary, please specify a character encoding (ASCII or UTF-8), and an allowed set of characters.
If we don't whitelist characters, we risk logging terminal escape sequences, or other arbitrary data.
I seem to remember strings used between directory authority /directory mirror relays and clients to communicate certain errors (clock skew?), but what's probably reality is a *code* is communicated and what I'm thinking of is merely the Tor client interpreting the code for logging purposes.
Regardless, we probably don't really need a string. It occurs to me we might want *something* that carries more information than a code; for example, a MSM_ERR cell with a code stating "I'm refusing to be measured because I've been measured too recently" would benefit from a field stating either time till measurement allowed again or time since last measurement.
The error code is one of:
[... XXX TODO ...] 255 -- OTHER
The error string is optional in all cases. It isn't present if the first byte of err_str is null, otherwise it is present. It ends at the first null byte or the end of the cell, whichever comes first.
3.1.4 Measurement Mode
The relay considers the measurement to have started the moment it receives the first MSM_ECHO cell from any measurer.
What happens if the relay never receives a MSM_ECHO cell?
Do MSM_ECHO cells from invalid measurers count?
How much of the tor link protocol does the measurer implement? Currently, tor requires the following cells:
- VERSIONS
- NETINFO
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n542
If the relay never receives a MSM_ECHO cell, it never enters measurement mode (thus it never limits background traffic), eventually times out on waiting for the measurement to start, sends MSM_ERR cells to connected measurers/coordinator, and cleans up.
Only measurers that are a part of this measurement can send MSM_ECHO cells. Other measurers shouldn't have even been allowed to connect.
Measurers don't implement the tor link protocol because they require a Tor client to do the hard work for them, similar to how I clarified coordinators above.
At this point, the relay
- Starts a repeating 1s timer on which it will report the amount of background traffic to the coordinator over the coordinator's connection.
- Enters "measurement mode" and limits the amount of background traffic it handles according to the torrc option/consensus parameter.
The relay decrypts and echos back all MSM_ECHO cells it receives on measurement connections
Are MSM_ECHO cells relay cells? How much of the relay protocol does the measurer implement?
The references to decrypting cells suggest that MSM_ECHO cells are relay (circuit-level) cells. But earlier sections suggest that they are link cells.
If they are link cells, what key material is used for decryption? How do the measurer and relay agree on this key material?
If they are relay cells, do they use the ntor handshake? https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n1132
MEASURE cells are cells like CREATE, CREATED, RELAY, etc.
MSM_ECHO, MSM_PARAMS, etc. cells are MEASURE cells in the same way RELAY_BEGIN, RELAY_DATA, RELAY_SENDME, etc. are RELAY cells.
The relay needs to do AES on the MSM_ECHO cells (or for simplicity, the MSM_ECHO cell payload) like it does AES on relay cells. As for key material necessary to do that, that's an oversight. We've neglected to specify how it's derived.
Suggestions? Not a cryptographer, but off the top of my head, the measurer could simply tell the relay to use $key_i for MSM_ECHO cells on $connection_i. We just want the CPU load on the relay; we're not after security properties here (other than verifying the relay is actually doing the crypto, as discussed elsewhere).
until it has reported its amount of background traffic the same number of times as there are seconds in the measurement (e.g. 30 per-second reports for a 30 second measurement). After sending the last MSM_BG cell, the relay drops all buffered MSM_ECHO cells, closes all measurement connections, and exits measurement mode.
To be more precise here, can we say:
"the relay drops all inbound and outbound MSM_ECHO cells from measurers associated with the completed measurement"
Can we avoid assuming that there is always only one measurement happening at one time?
I think it's safe/smart/necessary to assume that, for a given relay, there is always only zero or one measurements happening.
- Measurements are scheduled s.t. coordinators won't try to measure a relay at the same time. - A coordinator trying to start a measurement while another one is ongoing can simply be sent a MSM_ERR cell stating as such. - The security arguments behind *3.3.2 Weight Inflation* only make sense when there is only one measurement at a time.
During the measurement the relay targets a ratio of background traffic to measurement traffic as specified by a consensus parameter/torrc option. For a given ratio r, if the relay has handled x cells of measurement traffic recently, Tor then limits itself to y = xr/(1-r) cells of non-measurement traffic this scheduling round. The target will enforce that a minimum of 10 Mbit/s of measurement traffic is recorded since the last background traffic scheduling round to ensure it always allows some minimum amount of background traffic.
Do you mean "a maximum of 10 Mbit/s of measurement traffic" ?
No. When getting ready to handle background traffic, if there has been less than 10 Mbit/s of measurement traffic recently, Tor will limit background traffic as if there was indeed 10 Mbit/s of measurement traffic.
This way the relay can always send at least some background traffic, and a malfunctioning/malicious FlashFlow deployment cannot stop all background client traffic going through a relay for 30 seconds by not sending it (very much) measurement traffic.
3.2 FlashFlow Components
The FF coordinator and measurer code will reside in a FlashFlow repository separate from little-t tor.
There are three notable parameters for which a FF deployment must choose values. They are:
- The number of sockets, s, the measurers should open, in aggregate, with the target relay. We suggest s=160 based on the FF paper.
- The bandwidth multiplier, m. Given an existing capacity estimate for a relay, z, the coordinator will instruct the measurers to, in aggregate, send m*z Mbit/s to the target relay. We recommend m=2.25.
- The measurement duration, d. Based on the FF paper, we recommend d=30 seconds.
Are these parameters per-coordinator, or network-wide?
How are they kept in sync between the coordinator and measurers?
Per-coordinator. The coordinator tells the measurers the parameters for each measurement.
The rest of this section first discusses notable functions of the FlashFlow coordinator, then goes on to discuss FF measurer code that will require supporting tor code.
3.2.1 FlashFlow Coordinator
The coordinator is responsible for scheduling measurements, aggregating results, and producing v3bw files. It needs continuous access to new consensus files, which it can obtain by running an accompanying Tor process in client mode.
Recent tor versions go dormant when they haven't built circuits for a while. There are options that prevent dormancy, but they are only designed for interactive applications.
Is the FlashFlow coordinator going to use tor to implement the tor link protocol?
If the coordinator uses tor, then it can use the same tor client instance that's downloading its consensuses.
Otherwise, you might just be better using a small stem script, and a download timer.
If you use a timer, you can download each new consensus, shortly after it is created. (Clients often have consensuses that are 1-2 hours old, unless specifically configured to fetch from directory authorities. Even then, they can take up to an hour to download a new consensus.)
As described elsewhere, the coordinator uses a Tor client in order to avoid implementing the tor link protocol itself. If there is not already a way to make a Tor client download every new consensus (e.g. a torrc option or an hourly control port command), we'll want to add that.
The coordinator has the following functions, which will be described in this section:
- result aggregation.
- schedule measurements.
- v3bw file generation.
3.2.1.1 Aggregating Results
Every second during a measurement, the measurers send the amount of verified measurement traffic they have received back from the relay. Additionally, the relay sends a MSM_BG cell each second to the coordinator with amount of non-measurement background traffic it is sending and receiving.
What happens if some of these cells is dropped by the relay, due to a traffic overload?
If these cells are exempt from the [Relay]Bandwidth{Rate,Burst} options, let's say that in this proposal.
What happens if some of these cells are delayed due to the MSM_ECHO cells?
How long a delay does the coordinator tolerate?
It would be bad for these cells to get dropped. I can't say if that means they need to be exempt from BandwidthRate (etc.) options.
IMO it would be fine if the cells were extremely delayed and they all arrived at the coordinator at the very end of the measurement in a bunch. Though not ideal, it would be "fine" if they arrived out of order or a few were lost. They can be reordered fine, and losing a few "just" leads to inaccuracy.
Obviously I think this should be mitigated. 100%. I'm just saying it's not a measurement failure if these things happen.
For each second's reports, the coordinator sums the measurer's reports. The coordinator takes the minimum of the relay's reported sent and received background traffic. If, when compared to the measurer's reports for this second, the relay's claimed background traffic is more than what's allowed by the background/measurement traffic ratio, then the coordinator further clamps the relay's report down. The coordinator adds this final adjusted amount of background traffic to the sum of the measurer's reports.
Once the coordinator has done the above for each second in the measurement (e.g. 30 times for a 30 second measurement), the coordinator takes the median of the 30 per-second throughputs and records it as the estimated capacity of the target relay.
3.2.1.2 Measurement Schedule
The short term implementation of measurement scheduling will be simpler than the long term one due to (1) there only being one FlashFlow deployment, and (2) there being very few relays that support being measured by FlashFlow. In fact the FF coordinator will maintain a list of the relays that have updated to support being measured and have opted in to being measured, and it will only measure them.
The coordinator divides time into a series of 24 hour periods, commonly referred to as days. Each period has measurement slots that are longer than a measurement lasts (30s), say 60s, to account for pre- and post-measurement work. Thus with 60s slots there's 1,440 slots in a day.
At the start of each day the coordinator considers the list of relays that have opted in to being measured. From this list of relays, it repeatedly takes the relay with the largest existing capacity estimate. It selects a random slot. If the slot has existing relays assigned to it, the coordinator makes sure there is enough additional measurer capacity to handle this relay. If so, it assigns this relay to this slot. If not, it keeps picking new random slots until one has sufficient additional measurer capacity.
What if the coordinator doesn't have enough capacity to handle all the relays on the network? (That is, what if all the slots are full?)
We can adjust the definition of the measurement period to: the maximum of
1. 24 hours, and 2. the amount of time the FlashFlow deployment with least capacity will take to measure the entire network + some factor.
If new relays appear during the day and all slots have been filled, that's unfortunate but they will just wait till the next day.
What if the capacity is limited at some other point on the internet?
For example:
- an intermediate transit provider between the measurer and all the chosen relays
- the chosen relays are all on the same local network
Ideally a single FlashFlow deployment's measurers are diverse to help mitigate the first point.
For the second, I don't have a good idea at this time. That shouldn't happen regularly. It will happen sometimes though, so perhaps this motivates a modification in how the coordinator chooses the weight for a relay. Instead of the result of the latest measurement, perhaps the highest result from the last X measurements.
Relays without existing capacity estimates are assumed to have the 75th percentile capacity of the current network.
If a relay is not online when it's scheduled to be measured, it doesn't get measured that day.
Online in the consensus, or listening via its ORPort? (There's a delay of up to 3 hours here, whenever the relay goes up or down.)
What bandwidth weight does an offline relay get? sbws has had issues because it drops offline relays.
Online as in both, I think.
I'm not up to speed or have forgotten why continuing to give weight to offline relays is important (and this may not be the place to enlighten me). Naively I'd say zero. Assuming that's stupid, I **think** whatever weight FlashFlow would give it were it online is smarter than some minimum weight value. Suggestions?
3.2.1.2.1 Example
...
3.2.1.3 Generating V3BW files
Every hour the FF coordinator produces a v3bw file in which it stores the latest capacity estimate for every relay it has measured in the last week. The coordinator will create this file on the host's local file system. Previously-generated v3bw files will not be deleted by the coordinator.
Seems risky, we've seen Torflow fail in the past, because it filled up the disk with bandwidth files.
What's the required disk capacity for a few years of bandwidth files?
We can ship a script or provide a parameter to keep the last X v3bw files if that would be preferable to relying on bwauths using logrotate themselves or otherwise finding an archival/deletion strategy that fits their needs.
A symbolic link at a static path will always point to the latest v3bw file.
$ ls -l v3bw -> v3bw.2020-03-01-05-00-00 v3bw.2020-03-01-00-00-00 v3bw.2020-03-01-01-00-00 v3bw.2020-03-01-02-00-00 v3bw.2020-03-01-03-00-00 v3bw.2020-03-01-04-00-00 v3bw.2020-03-01-05-00-00
You might want to reference the v3bw spec here: https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-spec.txt
3.2.2 FlashFlow Measurer
The measurers take commands from the coordinator
The command protocol is not specified in this proposal.
For example, does the coordinator send the IPv4 and IPv6 addresses of the relay to the measurers?
Which deployment parameters are sent via the protocol, and which are hard-coded in configurations?
A Tor proposal did not seem the place for some of these protocols, options, etc. existing entirely outside little-t tor. We can certainly elaborate better if that's wrong.
To answer these specific questions: the coordinator would send fingerprints to the measurers, and the ~only config options the measurers will have is information regarding the coordinator from which they shall expect commands. All other FlashFlow options (e.g. measurement duration) are configured at the coordinator and the coord informs the measurers.
connect to target relays with many sockets, send them traffic, and verify the received traffic is the same as what was sent. Measurers need access to a lot of internal tor functionality. One strategy is to house as much logic as possible inside an compile-time-optional control port module that calls into other parts of tor. Alternatively FlashFlow could link against tor and call internal tor functions directly.
[XXX for now I'll assume that an optional little-t tor control port module housing a lot of this code is the best idea.]
Yes, please don't depend on internal, unspecified interfaces.
Notable new things that internal tor code will need to do on the measurer (client) side:
- Open many TLS+TCP connections to the same relay on purpose.
- Verify echo cells.
3.2.2.1 Open many connections
...
3.3 Security
...
- FlashFlow measurement system: Medium term
The medium term deployment stage begins after FlashFlow has been implemented and relays are starting to update to a version of Tor that supports it.
We avoid using tor versions to detect relay features. Instead, we use subprotocol versions:
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n2041
In the first tor release that supports the medium-term FlashFlow, let's reserve a "Link" protocol version:
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n2094
If any of the FlashFlow cells are relay cells, let's also reserve a "Relay" protocol version:
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n2122
(We don't want to pick the exact version numbers yet. Let's wait until the actual tor release.)
We plan to host a FlashFlow deployment consisting of a FF coordinator and a single FF measurer on a single 1 Gbit/s machine. Data produced by this deployment will be made available (semi?) publicly, including both v3bw files and intermediate results.
All directory authorities publish v3bw files at a standard URL, so if you use these files in voting, they will be public.
Any development changes needed during this time would go through separate proposals.
- FlashFlow measurement system: Long term
In the long term, finishing-touch development work will be done, including adding better authentication and measurement scheduling, and experiments will be run to determine the best way to integrate FlashFlow into the Tor ecosystem.
Any development changes needed during this time would go through separate proposals.
5.1 Authentication to Target Relay
Short term deployment already had FlashFlow coordinators using TLS certificates when connecting to relays, but in the long term, directory authorities will vote on the consensus parameter for which coordinators should be allowed to perform measurements. The voting is done in the same way they currently vote on recommended tor versions.
FlashFlow measurers will be updated to use TLS certificates when connecting to relays too. FlashFlow coordinators will update the contents of MSM_PARAMS cells to contain measurer TLS certificates instead of IP addresses, and relays will update to expect this change.
You'll want another new "Link" protocol version for this feature. And another type of link specifier.
5.2 Measurement Scheduling
Short term deployment only has one FF deployment running. Long term this may no longer be the case because, for example, more than one directory authority decides to adopt it and they each want to run their own deployment. FF deployments will need to coordinate between themselves to not measure the same relay at the same time, and to handle new relays as they join during the middle of a measurement period (during the day).
The following is quoted from Section 4.3 of the FlashFlow paper.
To measure all relays in the network, the BWAuths periodically determine the measurement schedule. The schedule determines when and by whom a relay should be measured. We assume that the BWAuths have sufficiently synchronized clocks to facilitate coordinating their schedules. A measurement schedule is created for each measurement period, the length p of which determines how often a relay is measured. We use a measurement period of p = 24 hours.
To help avoid active denial-of-service attacks on targeted relays, the measurement schedule is randomized and known only to the BWAuths. Before the next measurement period starts, the BWAuths collectively generate a random seed (e.g. using Tor’s secure-randomness protocol). Each BWAuth can then locally determine the shared schedule using pseudorandom bits extracted from that seed.
As noted above, communication between BWAuths reduces their independence, and adds additional risk and complexity in the protocol.
All of this is supposed to be non-interactive, yes. It is under-specified at this time (IMO). I think this is okay for now because this specifically is a long term thing currently far away.
Once-Off Shared Secret Exchange
Here's an alternative protocol, that does not require an additional shared random implementation:
The BWAuths manually exchange a shared secret key (SHARED_SECRET) out-of-band
Every day, the BWAuths independently derive a shared secret seed for the measurement protocol, using a hash function (H), and tor's public shared random value (SRV):
DAILY_SECRET = H(SHARED_SECRET | SRV)
We might also want to use the period number here, like the SRV and onion service hash ring specs.
The shared secret key should be rotated:
- each time a new BWAuth is added or removed from the network, and
- 1 year after the last rotation.
The key rotation can be performed over a few days, because:
- each BWAuth has one of two keys: the new key, or the old key,
- overlaps should be rare in practice,
- when there is an overlap, at most two BWAuths will overlap, one from each key,
- overlaps have a low impact for most relays.
The algorithm to create the schedule considers each measurement period to be divided into a sequence of t-second measurement slots. For each old relay, slots for each BWAuth to measure it are selected uniformly at random without replacement from all slots in the period that have sufficient unallocated measurement capacity to accommodate the measurement. When a new relay appears, it is measured separately by each BWAuth in the first slots with sufficient unallocated capacity. Note that this design ensures that old relays will continue to be measured, with new relays given secondary priority in the order they arrive.
It's unclear whether this protocol is interactive or not.
Here's a protocol that is explicitly non-interactive:
- Measurers are assigned a daily order, based on each coordinator's certificate hash, and the current DAILY_SECRET.
- For each coordinator, in the daily order: a. Relays in a chosen consensus choose a slot at random, based on the DAILY_SECRET, the relay key, and the iteration number c. If another coordinator is already measuring that relay in that slot, increase the iteration number, and repeat from a. b. If the slot is full for the current coordinator, increase the iteration number, and repeat from a. d. Otherwise, allocate that relay to that slot, for that coordinator.
We might also want to use other shared data here, like the consensus timestamp.
To make sure all the coordinators have the same consensus, we should keep a copy of the most recent shared consensus. Here's how we can select a shared consensus:
- if we're using a scheduled fetch, a consensus from at least 1 hour ago (usually 2300 UTC),
- if we're using a tor client to fetch, a consensus from at least 3 hours ago (usually 2100 UTC).
If there isn't a consensus for that time, we should keep the most recent consensus before that time.
It doesn't actually matter if the consensus is a little out of sync, most relays will have the same fingerprints, and end up in the same slots.
5.3 Experiments
[XXX todo]
5.4 Other Changes/Investigations/Ideas
...
- Citations
[0] F. Thill. Hidden Service Tracking Detection and Bandwidth Cheating in Tor Anonymity Network. Master’s thesis, Univ. Luxembourg, 2014. [1] A. Johnson, R. Jansen, N. Hopper, A. Segal, and P. Syverson. PeerFlow: Secure Load Balancing in Tor. Proceedings on Privacy Enhancing Technologies (PoPETs), 2017(2), April 2017. [2] Mike Perry: Graph onionperf and consensus information from Rob's experiments https://trac.torproject.org/projects/tor/ticket/33076
T
Hi Matt,
Thanks for the quick response!
I've trimmed the conversation to the comments that need further discussion.
On 25 Apr 2020, at 06:46, Matt Traudt pastly@torproject.org wrote:
On 4/23/20 21:05, teor wrote:
...
- msm_duration [1 byte]
What are the minimum and maximum valid values for this field? 1..255 ?
Do we want to limit measurements to 4 minutes at a protocol level?
In general, protocols should make invalid states impossible to represent. But do we want a 4 minute hard limit here?
This document suggests a measurement duration of 30 seconds. We see no reason to ever go above 1 minute. If there's a byte to spare, then sure let's make this a uint16.
I've thought about this a bit more, and from a user experience perspective, we also want a 30 second limit. (Most users will give up on a slow connection after 30 seconds.)
So as long as there is a documented limit in the protocol, we should be fine with 2 bytes.
second is the number of seconds since the measurement began. MSM_BG cells are sent once per second from the relay to the FlashFlow coordinator. The first cell will have this set to 1, and each subsequent cell will increment it by one. sent_bg_bytes is the number of background traffic bytes sent in the last second (since the last MSM_BG cell). recv_bg_bytes is the same but for received bytes.
The payload of MSM_ERR cells:
- err_code [1 byte]
- err_str [possibly zero-len null-terminated string]
We don't have strings in any other tor protocol cells.
If you need extensible error information, can I suggest using ext-type-length-value fields:
N_EXTENSIONS [1 byte] N_EXTENSIONS times: EXT_FIELD_TYPE [1 byte] EXT_FIELD_LEN [1 byte] EXT_FIELD [EXT_FIELD_LEN bytes]
https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt#n1518
If strings are necessary, please specify a character encoding (ASCII or UTF-8), and an allowed set of characters.
If we don't whitelist characters, we risk logging terminal escape sequences, or other arbitrary data.
I seem to remember strings used between directory authority /directory mirror relays and clients to communicate certain errors (clock skew?), but what's probably reality is a *code* is communicated and what I'm thinking of is merely the Tor client interpreting the code for logging purposes.
There are two sources for clock skew warnings: * A binary time field in NETINFO cells * A HTTP header on directory documents
The header is text, but it's very structured, and at a different protocol layer.
In another part of the directory protocol, when authorities reject a relay descriptor upload, they send a rejection reason to the relay. That's unstructured text in a HTTP response. (But we do escape it before logging.)
Regardless, we probably don't really need a string. It occurs to me we might want *something* that carries more information than a code; for example, a MSM_ERR cell with a code stating "I'm refusing to be measured because I've been measured too recently" would benefit from a field stating either time till measurement allowed again or time since last measurement.
Yes, I think a code and ext-type-length-value fields for any additional info would work here.
At this point, the relay
- Starts a repeating 1s timer on which it will report the amount of background traffic to the coordinator over the coordinator's connection.
- Enters "measurement mode" and limits the amount of background traffic it handles according to the torrc option/consensus parameter.
The relay decrypts and echos back all MSM_ECHO cells it receives on measurement connections
Are MSM_ECHO cells relay cells? How much of the relay protocol does the measurer implement?
The references to decrypting cells suggest that MSM_ECHO cells are relay (circuit-level) cells. But earlier sections suggest that they are link cells.
If they are link cells, what key material is used for decryption? How do the measurer and relay agree on this key material?
If they are relay cells, do they use the ntor handshake? https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n1132
MEASURE cells are cells like CREATE, CREATED, RELAY, etc.
In the Tor protocol specification, we call these "commands", and the cells are sent at the link level.
MSM_ECHO, MSM_PARAMS, etc. cells are MEASURE cells in the same way RELAY_BEGIN, RELAY_DATA, RELAY_SENDME, etc. are RELAY cells.
For relay cells, we call these "relay commands", and the cells are sent at the circuit level.
So it might be helpful to say "measure commands" here. It might also be helpful to distinguish "control" and "data" cells, in a similar way to the relay cell spec:
https://github.com/torproject/torspec/blob/master/tor-spec.txt#L1572
The relay needs to do AES on the MSM_ECHO cells (or for simplicity, the MSM_ECHO cell payload) like it does AES on relay cells. As for key material necessary to do that, that's an oversight. We've neglected to specify how it's derived.
Suggestions? Not a cryptographer, but off the top of my head, the measurer could simply tell the relay to use $key_i for MSM_ECHO cells on $connection_i. We just want the CPU load on the relay; we're not after security properties here (other than verifying the relay is actually doing the crypto, as discussed elsewhere).
I suggest that the client opens a real single-hop circuit, and sends RELAY_ECHO cells (a new relay command) on that circuit.
As part of this design: * RELAY_ECHO cells are only allowed from valid measurers * flow control is disabled on circuits from valid measurers (I think that's what you want here, but best to be explicit)
This design has a few advantages: * The design and coding is much simpler * FlashFlow automatically uses the latest relay crypto * The key material is automatically derived for you * The decryption and some verification is automatically performed for you * You can verify the cell contents using a simple memcmp()
There's a slight disadvantage: * When you skip decrypting a cell, the digest gets out of sync, so future cells have less validation. But I don't think that matters for single-hop circuits.
It's likely that your measurers will be network-bound, rather than CPU-bound. So you may be able to just use unmodified circuit crypto.
There are also security advantages to using unmodified relay crypto. If tor adds extra modes that skip decryption or verification, then it's easier to accidentally trigger those modes. (Via bugs or exploits.)
If we use unmodified relay crypto, then it's much harder to get tor into an insecure mode.
Here's what the cells would look like in detail:
16 -- RELAY_ECHO [forward] [control] 17 -- RELAY_ECHOED [backward] [control]
I think these should be control cells (circuit-level cells) rather than stream-level cells, because they are like RELAY_DROP:
10 -- RELAY_DROP [forward or backward] [control]
I don't have a strong opinion about the rest of the measure commands. They can stay as link-level cells. But if it turns out that it's easier to code them as circuit-level cells, we could add a new RELAY_MEASURE command.
until it has reported its amount of background traffic the same number of times as there are seconds in the measurement (e.g. 30 per-second reports for a 30 second measurement). After sending the last MSM_BG cell, the relay drops all buffered MSM_ECHO cells, closes all measurement connections, and exits measurement mode.
To be more precise here, can we say:
"the relay drops all inbound and outbound MSM_ECHO cells from measurers associated with the completed measurement"
Can we avoid assuming that there is always only one measurement happening at one time?
I think it's safe/smart/necessary to assume that, for a given relay, there is always only zero or one measurements happening.
- Measurements are scheduled s.t. coordinators won't try to measure a
relay at the same time.
- A coordinator trying to start a measurement while another one is
ongoing can simply be sent a MSM_ERR cell stating as such.
You're right, the relay can resolve clashes. It's important that we make that explicit.
During the measurement the relay targets a ratio of background traffic to measurement traffic as specified by a consensus parameter/torrc option. For a given ratio r, if the relay has handled x cells of measurement traffic recently, Tor then limits itself to y = xr/(1-r) cells of non-measurement traffic this scheduling round. The target will enforce that a minimum of 10 Mbit/s of measurement traffic is recorded since the last background traffic scheduling round to ensure it always allows some minimum amount of background traffic.
Do you mean "a maximum of 10 Mbit/s of measurement traffic" ?
No. When getting ready to handle background traffic, if there has been less than 10 Mbit/s of measurement traffic recently, Tor will limit background traffic as if there was indeed 10 Mbit/s of measurement traffic.
This way the relay can always send at least some background traffic, and a malfunctioning/malicious FlashFlow deployment cannot stop all background client traffic going through a relay for 30 seconds by not sending it (very much) measurement traffic.
I'm still a bit confused here.
When you say: "The target will enforce that a minimum of 10 Mbit/s of measurement traffic is recorded"
I think you mean: "... regardless of the actual traffic sent by the measurer."
But that raises another concern:
What about relays with very low bandwidths? Will they reserve all their traffic for users, and none for the measurer?
Using the suggested r=25%, the maximum non-measurement traffic is:
y = (10 Mbits)(0.25)/(1-0.25) = 3.3 Mbits
So a relay with an actual capacity of 3.3 Mbits, which is fully loaded with user traffic, will send no measurement traffic.
That seems... unexpected.
At the moment, tor directory authorities default to:
AuthDirFastGuarantee 100 Kbytes AuthDirGuardBWGuarantee 2 Mbytes
So maybe we should derive the limit based on these values?
3.2.1 FlashFlow Coordinator
The coordinator is responsible for scheduling measurements, aggregating results, and producing v3bw files. It needs continuous access to new consensus files, which it can obtain by running an accompanying Tor process in client mode.
Recent tor versions go dormant when they haven't built circuits for a while. There are options that prevent dormancy, but they are only designed for interactive applications.
Is the FlashFlow coordinator going to use tor to implement the tor link protocol?
If the coordinator uses tor, then it can use the same tor client instance that's downloading its consensuses.
Otherwise, you might just be better using a small stem script, and a download timer.
If you use a timer, you can download each new consensus, shortly after it is created. (Clients often have consensuses that are 1-2 hours old, unless specifically configured to fetch from directory authorities. Even then, they can take up to an hour to download a new consensus.)
As described elsewhere, the coordinator uses a Tor client in order to avoid implementing the tor link protocol itself. If there is not already a way to make a Tor client download every new consensus (e.g. a torrc option or an hourly control port command), we'll want to add that.
If the coordinator is constantly sending network traffic to relays, then it shouldn't go dormant.
Here are the torrc options you might want to set on the coordinator:
# Set this to your maximum expected gap between relay measurements, # including network downtime and other emergencies. # Particularly important during the initial deployment. DormantClientTimeout 1 week
# You may also need to set FetchUselessDescriptors 1
# Get new relays as fast as possible. FetchDirInfoEarly 1 FetchDirInfoExtraEarly 1
This is starting to look like the sbws config, you probably want most of these options on controllers and measurers: https://github.com/torproject/sbws/blob/master/sbws/globals.py#L20
What if the capacity is limited at some other point on the internet?
For example:
- an intermediate transit provider between the measurer and all the chosen
relays
- the chosen relays are all on the same local network
Ideally a single FlashFlow deployment's measurers are diverse to help mitigate the first point.
For the second, I don't have a good idea at this time. That shouldn't happen regularly. It will happen sometimes though, so perhaps this motivates a modification in how the coordinator chooses the weight for a relay. Instead of the result of the latest measurement, perhaps the highest result from the last X measurements.
That seems like a good idea.
It might also help to measure relays in each family in separate slots. You might also want to do the same thing with relays that are in: * the same IPv4 /24 * the same IPv6 /48
Or at the very least, relays on the same IP address.
Relays without existing capacity estimates are assumed to have the 75th percentile capacity of the current network.
If a relay is not online when it's scheduled to be measured, it doesn't get measured that day.
Online in the consensus, or listening via its ORPort? (There's a delay of up to 3 hours here, whenever the relay goes up or down.)
What bandwidth weight does an offline relay get? sbws has had issues because it drops offline relays.
Online as in both, I think.
I'm not up to speed or have forgotten why continuing to give weight to offline relays is important (and this may not be the place to enlighten me). Naively I'd say zero. Assuming that's stupid, I **think** whatever weight FlashFlow would give it were it online is smarter than some minimum weight value. Suggestions?
You'll need relays to be in the consensus to do connection crypto, and listening on their ORPort to actually connect. Then you can measure.
Relays sometimes drop out of the consensus between their measurement, and the creation of the v3bw file. So don't check if they are online when you create that file.
Using the median of the past few measurements is a good idea anyway: tor has a daily user bandwidth cycle. And it helps deal with missing measurements.
3.2.1.2.1 Example
...
3.2.1.3 Generating V3BW files
Every hour the FF coordinator produces a v3bw file in which it stores the latest capacity estimate for every relay it has measured in the last week. The coordinator will create this file on the host's local file system. Previously-generated v3bw files will not be deleted by the coordinator.
Seems risky, we've seen Torflow fail in the past, because it filled up the disk with bandwidth files.
What's the required disk capacity for a few years of bandwidth files?
We can ship a script or provide a parameter to keep the last X v3bw files if that would be preferable to relying on bwauths using logrotate themselves or otherwise finding an archival/deletion strategy that fits their needs.
If you provide a default maximum age, then you can document the disk capacity that's required to keep that many files.
If operators have more or less disk, they can change the defaults.
3.2.2 FlashFlow Measurer
The measurers take commands from the coordinator
The command protocol is not specified in this proposal.
For example, does the coordinator send the IPv4 and IPv6 addresses of the relay to the measurers?
Which deployment parameters are sent via the protocol, and which are hard-coded in configurations?
A Tor proposal did not seem the place for some of these protocols, options, etc. existing entirely outside little-t tor. We can certainly elaborate better if that's wrong.
I'm not sure. It's helpful to have a design overview somewhere, and to reference it from the proposal.
I don't have strong opinions about exactly where it is located. Most similar documentation is external, but the BridgeDB spec is part of the Tor specifications: https://github.com/torproject/torspec/blob/master/bridgedb-spec.txt
One helpful question is:
Who will maintain this software over the long term?
Ask them how they want it to be specified.
T
On 4/23/20 1:48 PM, Matt Traudt wrote:
5.4 Other Changes/Investigations/Ideas
- How can FlashFlow data be used in a way that doesn't lead to poor load balancing given the following items that lead to non-uniform client behavior:
- Guards that high-traffic HSs choose (for 3 months at a time)
- Guard vs middle flag allocation issues
- New Guard nodes (Guardfraction)
- Exit policies other than default/all
- Directory activity
- Total onion service activity
- Super long-lived circuits
- What is the explanation for dennis.jackson's scary graphs in this [2] ticket? Was it because of the speed test? Why? Will FlashFlow produce the same behavior?
It will also be wise to provide a way for relays to signify that they are on the same machine. I bet concurrent machine deployments are one of the top contributors to the long tail of bad perf we saw caused by the Flashflow experiment[2]. If flashflow measures each such relay as having the full link capacity instead of a shared fraction, this is obviously going to result in overload on those relays, leading to a long tail of bad perf when they are chosen and are also overloaded. It is unlikely that we can deploy a FlashFlow that has this long tail perf problem without fixing this and related balancing issues (though hopefully most will be smoothed over by sbws).
This is a little tricky, because we might not want rogue relays joining each others "machines" (similar to the Family problem), but for testing something as simple as how MyFamily works would be great. Ideally, though, relays would ask or detect that they are concurrently running in nearby IP space and either warn the operator to set the flag, or set it automatically.
We actually have this work included in a future performance funding proposal, but the timeline on that getting approved (or even rejected) is so far out that we should figure out a way to do this before that, especially if Flashflow development is going to begin soon.
[2] Mike Perry: Graph onionperf and consensus information from Rob's experiments https://trac.torproject.org/projects/tor/ticket/33076
On 16 May 2020, at 16:05, Mike Perry mikeperry@torproject.org wrote:
On 4/23/20 1:48 PM, Matt Traudt wrote:
5.4 Other Changes/Investigations/Ideas
- How can FlashFlow data be used in a way that doesn't lead to poor
load balancing given the following items that lead to non-uniform client behavior:
- Guards that high-traffic HSs choose (for 3 months at a time)
- Guard vs middle flag allocation issues
- New Guard nodes (Guardfraction)
- Exit policies other than default/all
- Directory activity
- Total onion service activity
- Super long-lived circuits
- What is the explanation for dennis.jackson's scary graphs in this [2]
ticket? Was it because of the speed test? Why? Will FlashFlow produce the same behavior?
It will also be wise to provide a way for relays to signify that they are on the same machine. I bet concurrent machine deployments are one of the top contributors to the long tail of bad perf we saw caused by the Flashflow experiment[2]. If flashflow measures each such relay as having the full link capacity instead of a shared fraction, this is obviously going to result in overload on those relays, leading to a long tail of bad perf when they are chosen and are also overloaded. It is unlikely that we can deploy a FlashFlow that has this long tail perf problem without fixing this and related balancing issues (though hopefully most will be smoothed over by sbws).
This is a little tricky, because we might not want rogue relays joining each others "machines" (similar to the Family problem), but for testing something as simple as how MyFamily works would be great. Ideally, though, relays would ask or detect that they are concurrently running in nearby IP space and either warn the operator to set the flag, or set it automatically.
We actually have this work included in a future performance funding proposal, but the timeline on that getting approved (or even rejected) is so far out that we should figure out a way to do this before that, especially if Flashflow development is going to begin soon.
We could assume that relays on the same IPv4 /24 or IPv6 /48 share a network link, and re-do the experiment.
Then we could tweak the network size based on those results. We'd need to compromise between "false sharing" and "missed sharing".
Then individual operators could fine-tune that initial heuristic using the "same network link" config.
(This is similar to how MyFamily works: Tor assumes that relays in the same IPv4 /16 and IPv6 /32 have the same network operator. Then individual relay operators can declare extra families using MyFamily.)
T
On Thu, Apr 23, 2020 at 2:48 PM Matt Traudt pastly@torproject.org wrote:
Hi! I've got some comments on the FlashFlow proposal; I'll start with the ones that I think are most important, so that we can try to get them out of the way.
First off, I'm concerned about the approach where measurers get to consume a certain amount of bandwidth, with only a set fraction left to devote to the background traffic. It seems like a hostile set of measurers could use this authority to introduce traffic patterns on the network to assist in traffic analysis. In general, having regular scheduled and visible changes in relay capacity seem to me like they'd help out traffic analysis a good deal.
Second, the "MSM_BG" information type also seems like a serious traffic analysis risk. It is, literally, telling the measurers a report of how much traffic was sent each second on other connections. Previously we decided that a much coarser summary than this was too much information to publish in bandwidth-history lines, and I'm worried not to see any analysis here.
{In both of the above cases we might say, "well, an attacker could do that anyway!" But to get the traffic information, an attacker would need to compromise the upstream connection, and to introduce traffic spikes the attacker would need to risk detection. This proposal as written would make both of these traffic analysis opportunities an expected part of the infrastructure, which seems not-so-good to me.}
Third, I don't understand why we're using cell crypto here but we aren't using RELAY cells or (apparently?) circuits. Since TLS is already in play, we'll already be measuring the relays' encryption performance. But if we do decide that cell crypto is needed, then it's way easier to get that crypto happening if there are circuits involved. I think there's been some discussion of that on IRC; I'd suggest that we try to make that work if we can.
Fourth, this approach to authenticating echo cell contents seems needlessly complicated. Instead of using random contents and remembering a fraction of cells, it would make more sense for measurers to use a keyed pseudorandom stream function to generate the cells, and to verify the contents of all the cells as they come back in. (AES128-CTR and ChaCha8 and SHAKE128 all have nice properties here.)
Fifth, using IP addresses for identification is NOT something we do on the production network. I think we should authenticate measurers by identity key, not by IPv4 address (as is happening here, unless I misunderstand.)
yrs,
On 6/2/20 3:01 PM, Nick Mathewson wrote:
On Thu, Apr 23, 2020 at 2:48 PM Matt Traudt pastly@torproject.org wrote:
Hi! I've got some comments on the FlashFlow proposal; I'll start with the ones that I think are most important, so that we can try to get them out of the way.
First off, I'm concerned about the approach where measurers get to consume a certain amount of bandwidth, with only a set fraction left to devote to the background traffic. It seems like a hostile set of measurers could use this authority to introduce traffic patterns on the network to assist in traffic analysis. In general, having regular scheduled and visible changes in relay capacity seem to me like they'd help out traffic analysis a good deal.
Second, the "MSM_BG" information type also seems like a serious traffic analysis risk. It is, literally, telling the measurers a report of how much traffic was sent each second on other connections. Previously we decided that a much coarser summary than this was too much information to publish in bandwidth-history lines, and I'm worried not to see any analysis here.
{In both of the above cases we might say, "well, an attacker could do that anyway!" But to get the traffic information, an attacker would need to compromise the upstream connection, and to introduce traffic spikes the attacker would need to risk detection. This proposal as written would make both of these traffic analysis opportunities an expected part of the infrastructure, which seems not-so-good to me.}
Not just anyone can be a measurer. A bandwidth authority runs a coordinator and chooses to trust some small number of measurers. In practice, knowing the humans behind the dirauths today, I'd expect each would only trust measurers they run themself.
Third, I don't understand why we're using cell crypto here but we aren't using RELAY cells or (apparently?) circuits. Since TLS is already in play, we'll already be measuring the relays' encryption performance. But if we do decide that cell crypto is needed, then it's way easier to get that crypto happening if there are circuits involved. I think there's been some discussion of that on IRC; I'd suggest that we try to make that work if we can.
Yes the outcome of some IRC discussion with asn is that we should start with MSM_ECHO cells being RELAY cells until we discover that is untenable. The proposal will be updated to state this as well as clarify that we'll be building circuits on which to send measurement traffic.
Fourth, this approach to authenticating echo cell contents seems needlessly complicated. Instead of using random contents and remembering a fraction of cells, it would make more sense for measurers to use a keyed pseudorandom stream function to generate the cells, and to verify the contents of all the cells as they come back in. (AES128-CTR and ChaCha8 and SHAKE128 all have nice properties here.)
The motivation here is to do everything possible to prevent measurers from being a bottleneck before the relay, which was a problem for us in prototyping. This final transition-able FlashFlow is starting out with verifying all cells the normal way so we can see if this can of worms is even necessary. I'm not optimistic.
Fifth, using IP addresses for identification is NOT something we do on the production network. I think we should authenticate measurers by identity key, not by IPv4 address (as is happening here, unless I misunderstand.)
It's the short term deployment that we propose use IP addresses for identification. At this point there's 1 FlashFlow deployment being operated by us and measuring relays that have opted in to running our patches (realistically: just us, but hopefully some adventurous operators too). In the medium/long term coords/measurers will use proper TLS identities that will be checked by the relays.
Maybe that's still unacceptable, but I just wanted to make that clear.
Matt