Hi,
tldr:
- more outdated relays
(that is a claim I'm making and you could
easily proof me wrong by recreating the 0.3.3.x alpha
repos and ship 0.3.3.7 in them and see how things evolve
after a week or so)
- more work for the tpo website maintainer
- less happy relay operators [3][4]
- more work for repo maintainers? (since a new repo needs to be created)
When the tor 0.3.4 alpha repos (deb.torproject.org) first appeared on 2018-05-23
I was about to submit a PR for the website to include it in the sources.list
generator [1] on tpo but didn't do it because I wanted to wait for a previous PR to be merged first.
The outstanding PR got merged eventually (2018-06-28) but I still did not submit a PR to
update the repo generator for 0.3.4.x nonetheless and here is why.
Recently I was wondering why are there so many relays running tor version 0.3.3.5-rc? (see OrNetStats or Relay Search)
(> 3.2% CW fraction)
Then I realized that this was the last version the tor-experimental-0.3.3.x-*
repos were shipping before they got abandoned due to the new 0.3.4.x-* repos
(I can no longer verify it since they got removed by now).
Peter made it clear in the past that the current way to
have per-major-version debian alpha repos (i.e. tor-experimental-0.3.4.x-jessie)
will not change [2]:
> If you can't be bothered to change your sources.list once or twice a
> year, then you probably should be running stable.
but maybe someone else would be willing to invoke a
"ln" commands everytime a new new alpha repo is born.
tor-alpha-jessie -> tor-experimental-0.3.4.x-jessie
once 0.3.5.x repos are created the link would point to
tor-alpha-jessie -> tor-experimental-0.3.5.x-jessie
It is my opinion that this will help reduce the amount of relays running
outdated versions of tor.
It will certainly avoid having to update the tpo website, which isn't a big task
and could probably be automated but it isn't done currently.
"..but that would cause relay operators to jump from i.e. 0.3.3.x to 0.3.4.x alphas
(and break setups)!"
Yes, and I think that is better than relays stuck on an older version because
the former repo no longer exists and operators still can choose the old repos
which will not jump to newer major versions.
[1] https://www.torproject.org/docs/debian.html.en#ubuntu
[2] https://trac.torproject.org/projects/tor/ticket/14997#comment:3
[3] https://lists.torproject.org/pipermail/tor-relays/2018-June/015549.html
[4] https://trac.torproject.org/projects/tor/ticket/26474
--
https://twitter.com/nusenu_https://mastodon.social/@nusenu
Hi,
every now and then I'm in contact with relay operators
about the "health" of their relays.
Following these 1:1 discussions and the discussion on tor-relays@
I'd like to rise two issues with you (the developers) with the goal
to help improve relay operations and end user experience in the long term:
1) DNS (exits only)
2) tor relay health data
1) DNS
------
Current situation:
Arthur Edelstein provides public measurements to tor exit relay operators via
his page at: https://arthuredelstein.net/exits/
This page is updated once daily.
the process to use that data looks like this:
- first they watch Arthur's measurement results
- if their failure rate is non-zero they try to tweak/improve/change their setup
- wait for another 24 hours (next measurement)
This is a somewhat suboptimal and slow feedback loop and is probably also
less accurate and less valuable data when compared to the data the tor
process can provide.
Suggestion for improvement:
Exposes the following DNS status information
via tor's controlport to help debug and detect DNS issues on exit relays:
(total numbers since startup)
- amount of DNS queries send to the resolver
- amount of DNS queries send to the resolver due to a RESOLVE request
- DNS queries send to resolver due to a reverse RESOLVE request
- amount of queries that did not result in any answer from the resolver
- breakdown of number of responses by response code (RCODE)
https://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-pa…
- max amount of DNS queries send per curcuit
If this causes a significant performance impact this feature should be disabled
by default.
2) general relay health metrics
--------------------------------
Compared to other server daemons (webserver, DNS server, ..)
tor provides little data for operators to detect operational issues
and anomalies.
I'd suggest to provide the following stats via the control port:
(most of them are already written to logfiles by default but not accessible
via the controlport as far as I've seen)
- total amount of memory used by the tor process
- amount of currently open circuits
- circuit handshake stats (TAP / NTor)
DoS mitigation stats
- amount of circuits killed with too many cells
- amount of circuits rejected
- marked addresses
- amount of connections closed
- amount of single hop clients refused
- amount of closed/failed circuits broken down by their reason value
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n1402https://gitweb.torproject.org/torspec.git/tree/control-spec.txt#n1994
- amount of closed/failed OR connections broken down by their reason value
https://gitweb.torproject.org/torspec.git/tree/control-spec.txt#n2205
If this causes a significant performance impact this feature should be disabled
by default.
cell stats
- extra info cell stats
as defined in:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n1072
This data should be useful to answer the following questions:
- High level questions: Is the tor relay healthy?
- is it hitting any resource limits?
- is the tor process under unusual load?
- why is tor using more memory?
- is it slower than usual at handling circuits?
- can the DNS resolver handle the amount of DNS queries tor is sending it?
This data could help prevent errors from occurring or provide
additional data when trying to narrow down issues.
When it comes to the question:
**Is it "safe" to make this data accessible via the controlport?**
I assume it is safe for all information that current versions of
tor writes to logfiles or even publishes as part of its extra info descriptor.
Should tor provide this or similar data
I'm planing to write scripts for operators to make use
of that data (for example a munin plugin that connects to tor's controlport).
I'm happy to help write updates for control-spec should these features
seem reasonable to you.
Looking forward to hearing your feedback.
nusenu
--
https://twitter.com/nusenu_https://mastodon.social/@nusenu
Hi all
I polished up the FlashFlow proposal based on the feedback provided by
Teor, Nick, and Mike. I converted it to markdown, and pasted the new
text at the bottom of this email. The updated proposal is also in my
fork of torspec is on gitlab [0]; the branch with the changes is
"flashflow-revision".
I'd like to address Nick's concerns [1] again [2].
1. Measurers get to consume lots of bandwidth at relays. Malicious
measurers could do bad things. More generally, these measurements may
induce traffic patterns that assist traffic analysis.
Measurers are trusted entities. In practice we expect each bwauth
wanting to run FlashFlow to run a deployment entirely on their own
despite FF's design allowing people they trust to run the measurers. I
do not think measurers themselves being malicious is a major concern.
Relays target some ratio of background/measurement traffic during a
measurement. If the amount of traffic they normally see is less than the
amount of background traffic allowed by this ratio during a measurement,
then there should be no observable impact on the background traffic
during the measurement. To illustrate: assume the ratio is 25%. A relay
that is only forwarding traffic at 10% of its capacity will not have to
throttle traffic during the measurement.
But what about relays that *do* handle more traffic than the ratio allows?
First an easy mitigation (not solution): The ratio can be raised, thus
reducing the number of relays with normal background traffic levels
exceeding it. We suggest a ratio of 25% (maximum inflation factor of
1.33); it could be raised to, say, 50% (2.0) or even 75% (4.0) if we're
okay with an adversary doubling or quadrupling its weight (how? [fn1]).
I assume this is not okay.
Another easy mitigation: relay capacities don't change very often, so
don't measure every relay every day like we suggest. Measure new relays
ASAP, but for relays with an existing measurement, only measure them
once a week. Just like that, the adversary has to wait much longer for
the FF-induced pattern.
Another idea: first group the relays (say, randomly by hashing their
keys into k bins, with maybe k=10). If the entire network could be
measured in t time (we showed howe 5h–6h might currently suffice), then
we could have each group of relays simultaneously start restricting
background traffic at a different p=t/k period of time (for k=10 and t=6
hours, we would have p=36 minutes). All relays in a group would be
measured during their period. This would reduce performance overall but
would limit the adversary to learning from the FF attack only the group
that a measured relay is in.
Finally we note that the measurement schedule is random and known only
to the bwauths/coordinators. Unless the adversary has infiltrated a FF
deployment or bwauth, attacks involving the adversary referencing a
known schedule should not be possible.
The adversary could maintain connections through all relays and monitor
each for a drop in performance indicating it's under measurement;
however, they have the burden of performing this monitoring, making it
more costly and similar to just performing the attack (introducing
spikes/patterns/congestion) itself. Moreover, if the adversary were able
to monitor performance of all relays over time in this way, then he
could just use natural variations in performance (due to varying client
load) to perform the correlation attack anyway:
"Stealthy Traffic Analysis of Low-Latency Anonymous Communication
Using Throughput Fingerprinting"
Prateek Mittal et al.
ACM CCS 2011
https://dl.acm.org/doi/pdf/10.1145/2046707.2046732
2. MEAS_BG cells leak way too much info
These cells go to the FF coordinators, operated by the bwauths. Coords
don't use these cells beyond calculating the capacities of relays. An
adversary looking at v3bw files or votes does not learn anything about
the amount of background traffic that went through a relay during its
measurement.
3. Do cell crypto, use circuits
As mentioned in [2], the implementation is starting out doing cell
crypto on all cells at the measurer. The proposal has been updated to
clarify that circuits are used and the measurement traffic (MEAS_ECHO
cells) are RELAY cells.
4. Use a pseudorandom stream function to generate cells
Good idea that we will investigate and talk about more. Again, we're
worried about the measurer's CPU ending up being a bottleneck before its
network, which happened to us in prototyping: the measurer was at 100%
CPU while a relay on the same type of machine was not yet
5. IPs for measurer identification
(copy/paste from previous response to Nick)
It's the short term deployment that we propose use IP addresses for
identification. At this point there's 1 FlashFlow deployment being
operated by us and measuring relays that have opted in to running our
patches (realistically: just us, but hopefully some adventurous
operators too). In the medium/long term coords/measurers will use proper
TLS identities that will be checked by the relays.
Maybe that's still unacceptable, but I just wanted to make that clear.
[0]: https://gitlab.torproject.org/pastly/torspec
[1]: https://lists.torproject.org/pipermail/tor-dev/2020-June/014341.html
[2]: https://lists.torproject.org/pipermail/tor-dev/2020-June/014342.html
[fn1]: The relay allocates all its bandwidth to measurement traffic
(leaving none for background traffic), yet claim lots of background
traffic. The relay will be trusted to be telling the truth up to the
ratio. A 1 Gbit/s relay that sustains 1 Gbit/s of measurement traffic
can claim up to: 333 Mbit/s of client traffic (r=25%), 1 Gbit/s (r=50%),
3 Gbit/s (r=75%). In general, the inflation factor is calculated 1/(1-r).
----------------------------------------
```
Filename: 316-flashflow.txt
Title: FlashFlow: A Secure Speed Test for Tor (Parent Proposal)
Author: Matthew Traudt, Aaron Johnson, Rob Jansen, Mike Perry
Created: 23 April 2020
Status: Draft
```
# 1. Introduction
FlashFlow is a new distributed bandwidth measurement system for Tor that
consists of a single authority node ("coordinator") instructing one or
more measurement nodes ("measurers") when and how to measure Tor relays.
A measurement consists of the following steps:
1. The measurement nodes demonstrate to the target relay permission to
perform measurements.
2. The measurement nodes open many TCP connections to the target relay
and create a one-hop circuit to the target relay on each one.
3. For 30 seconds the measurement nodes send measurement cells to the
target relay and verify that the cells echoed back match the ones
sent. During this time the relay caps the amount of background
traffic it transfers. Background and measurement traffic are
handled separately at the relay. Measurement traffic counts towards
all the standard existing relay statistics.
4. For every second during the measurement, the measurement nodes
report to the authority node how much traffic was echoed back. The
target relay also reports the amount of per-second background
(non-measurement) traffic.
5. The authority node sums the per-second reported throughputs into 30
sums (one for each second) and calculates the median. This is the
estimated capacity of the relay.
FlashFlow performs a measurement of every relay according to a schedule
described later in this document. Periodically it produces relay
capacity estimates in the form of a v3bw file, which is suitable for
direct consumption by a Tor directory authority. Alternatively an
existing load balancing system such as Simple Bandwidth Scanner could be
modified to use FlashFlow's v3bw file as input.
It is envisioned that each directory authority that wants to use
FlashFlow will run their own FlashFlow deployment consisting of a
coordinator that they run and one or more measurers that they trust
(e.g. because they run them themselves), similar to how each runs their
own Torflow/sbws. Section 5 of this proposal describes long term plans
involving multiple FlashFlow deployments. *FlashFlow coordinators do not
need
to communicate with each other*.
FlashFlow is more performant than Torflow: FlashFlow takes 5 hours to
measure the entire existing Tor network from scratch (with 3 Gbit/s
measurer capacity) while Torflow takes 2 days; FlashFlow measures relays
it hasn't seen recently as soon as it learns about them (i.e. every new
consensus) while Torflow can take a day or more; and FlashFlow
accurately measures new high-capacity relays the first time and every
time while Torflow takes days/weeks to assign them their full fair share
of bandwidth (especially for non-exits). FlashFlow is more secure than
Torflow: FlashFlow allows a relay to inflate its measured capacity by up
to 1.33x (configured by a parameter) while Torflow allows weight
inflation by a factor of 89x [0] or even 177x [1].
After an overview in section 2 of the planned deployment stages, section
3, 4, and 5 discuss the short, medium, and long term deployment plans in
more detail.
# 2. Deployment Stages
FlashFlow's deployment shall be broken up into three stages.
In the short term we will implement a working FlashFlow measurement
system. This requires code changes in little-t tor and an external
FlashFlow codebase. The majority of the implementation work will be
done in the short term, and the product is a complete FlashFlow
measurement system. Remaining pieces (e.g. better authentication) are
added later for enhanced security and network performance.
In the medium term we will begin collecting data with a FlashFlow
deployment. The intermediate results and v3bw files produced will be
made available (semi?) publicly for study.
In the long term experiments will be performed to study ways of using FF
v3bw files to improve load balancing. Two examples: (1) using FF v3bw
files instead of sbws's (and eventually phasing out torflow/sbws), and
(2) continuing to run sbws but use FF's results as a better estimate of
relay capacity than observed bandwidth. Authentication and other
FlashFlow features necessary to make it completely ready for full
production deployment will be worked on during this long term phase.
# 3. FlashFlow measurement system: Short term
The core measurement mechanics will be implemented in little-t tor, but
a separate codebase for the FlashFlow side of the measurement system
will also be created. This section is divided into three parts: first a
discussion of changes/additions that logically reside entirely within
tor (essentially: relay-side modifications), second a discussion of the
separate FlashFlow code that also requires some amount of tor changes
(essentially: measurer-side and coordinator-side modifications), and
third a security discussion.
## 3.1 Little-T Tor Components
The primary additions/changes that entirely reside within tor on the
relay side:
- New torrc options/consensus parameters.
- New cell commands.
- Pre-measurement handshaking (with a simplified authentication
scheme).
- Measurement mode, during which the relay will echo traffic with
measurers, set a cap on the amount of background traffic it
transfers, and report the amount of transferred background traffic.
### 3.1.1 Parameters
FlashFlow will require some consensus parameters/torrc options. Each has
some default value if nothing is specified; the consensus parameter
overrides this default value; the torrc option overrides both.
FFMeasurementsAllowed: A global toggle on whether or not to allow
measurements. Even if all other settings would allow a measurement, if
this is turned off, then no measurement is allowed. Possible values: 0,
1. Default: 0 (disallowed).
FFAllowedCoordinators: The list of coordinator TLS certificate
fingerprints that are allowed to start measurements. Relays check their
torrc when they receive a connection from a FlashFlow coordinator to see
if it's on the list. If they have no list, they check the consensus
parameter. If nether exist, then no FlashFlow deployment is allowed to
measure this relay. Default: empty list.
FFMeasurementPeriod: A relay should expect on average, to be measured by
each FlashFlow deployment once each measurement period. A relay will not
allow itself to be measured more than twice by a FlashFlow deployment in
any time window of this length. Relays should not change this option
unless they really know what they're doing. Changing it at the relay
will not change how often FlashFlow will attempt to measure the relay.
Possible values are in the range [1 hour, 1 month] inclusive. Default: 1
day.
FFBackgroundTrafficPercent: The maximum amount of regular
non-measurement traffic a relay should handle while being measured, as a
percent of total traffic (measurement + non-measurement). This
parameter is a trade off between having to limit background traffic and
limiting how much a relay can inflate its result by handling no
background traffic but reporting that it has done so. Possible values
are in the range [0, 99] inclusive. Default: 25 (a maximum inflation
factor of 1.33).
FFMaxMeasurementDuration: The maximum amount of time, in seconds, that
is allowed to pass from the moment the relay is notified that a
measurement will begin soon and the end of the measurement. If this
amount of time passes, the relay shall close all measurement connections
and exit its measurement mode. Note this duration includes handshake
time, thus it necessarily is larger than the expected actual measurement
duration. Possible values are in the range [10, 120] inclusive.
Default: 45.
### 3.1.2 New Cell Types
FlashFlow will introduce a new cell command MEASUREMENT.
The payload of each MEASUREMENT cell consists of:
```
Measure command [1 byte]
Data [varied]
```
The measure commands are:
```
0 -- MEAS_PARAMS [forward]
1 -- MEAS_PARAMS_OK [backward]
2 -- MEAS_BG [backward]
3 -- MEAS_ERR [forward and backward]
```
Forward cells are sent from the measurer/coordinator to the relay.
Backward cells are sent from the relay to the measurer/coordinator.
MEAS_PARAMS and MEAS_PARAMS_OK are used during the pre-measurement stage
to tell the target what to expect and for the relay to positively
acknowledge the message.
The target send a MEAS_BG cell once per second to report
the amount of background traffic it is handling. MEAS_ERR cells are used
to signal to the other party that there has been some sort of problem
and that the measurement should be aborted. These measure commands are
described in more detail in the next section.
FlashFlow also introduces a new relay command, MEAS_ECHO. Relay celsl with
this relay command are the measurement traffic. The measurer generates and
encrypts them, sends them to the target, the target decrypts them, then it
sends them back. A variation where the measurer skips encryption of
MEAS_ECHO
cells in most cases is described in Appendix A, and was found to be
necessary
in paper prototypes to save CPU load at the measurer.
MEASUREMENT cells, on the other hand, are not encrypted (beyond the regular
TLS on the connection).
### 3.1.3 Pre-Measurement Handshaking/Starting a Measurement
The coordinator establishes a one-hop circuit with the target relay and
sends
it a MEAS_PARAMS cell. If the target is unwilling to be measured at this
time
or if the coordinator didn't use a TLS certificate that the target
trusts, it
responds with an error cell and closes the connection. Otherwise it checks
that the parameters of the measurement are acceptable (e.g. the version is
acceptable, the duration isn't too long, etc.). If the target is happy, it
sends a MEAS_PARAMS_OK, otherwise it sends a MEAS_ERR and closes the
connection.
Upon learning the IP addresses of the measurers from the coordinator in
the MEAS_PARAMS cell, the target whitelists their IPs in its DoS
detection subsystem until the measurement ends (successfully or
otherwise), at which point the whitelist is cleared.
Upon receiving a MEAS_PARAMS_OK from the target, the coordinator will
instruct
the measurers to open their circuits (one circuit per connection) with the
target. If the coordinator or any measurer receives a MEAS_ERR, it
reports the
error to the coordinator and considers the measurement a failure. It is
also a
failure if any measurer is unable to open at least half of its circuits with
the target.
The payload of MEAS_PARAMS cells [XXX more may need to be added]:
```
- meas_duration [2 bytes] [1, 600]
- num_measurers [1 byte] [1, 10]
- measurer_info [num_measurers times]
```
meas_duration is the duration, in seconds, that the actual measurement will
last. num_measurers is how many link_specifier structs follow containing
information on the measurers that the relay should expect. Future
versions of
FlashFlow and MEAS_PARAMS will use TLS certificates instead of IP addresses.
[XXX probably need diff layout to allow upgrade to TLS certs instead of
link_specifier structs. probably using ext-type-length-value like teor
suggests]
[XXX want to specify number of conns to expect from each measurer here?]
MEAS_PARAMS_OK has no payload: it's just padding bytes to make the cell
PAYLOAD_LEN (509) bytes long.
The payload of MEAS_ECHO cells:
```
- arbitrary bytes [PAYLOAD_LEN bytes]
```
The payload of MEAS_BG cells [XXX more for extra info? like CPU usage]:
```
- second [2 byte] [1, 600]
- sent_bg_bytes [4 bytes] [0, 2^32-1]
- recv_bg_bytes [4 bytes] [0, 2^32-1]
```
second is the number of seconds since the measurement began. MEAS_BG
cells are sent once per second from the relay to the FlashFlow
coordinator. The first cell will have this set to 1, and each
subsequent cell will increment it by one. sent_bg_bytes is the number of
background traffic bytes sent in the last second (since the last MEAS_BG
cell). recv_bg_bytes is the same but for received bytes.
The payload of MEAS_ERR cells [XXX need field for more info]:
```
- err_code [1 byte] [0, 255]
```
The error code is one of:
```
[... XXX TODO ...]
255 -- OTHER
```
### 3.1.4 Measurement Mode
The relay considers the measurement to have started the moment it
receives the first MEAS_ECHO cell from any measurer. At this point, the
relay
- Starts a repeating 1s timer on which it will report the amount of
background traffic to the coordinator over the coordinator's
connection.
- Enters "measurement mode" and limits the amount of background
traffic it handles according to the torrc option/consensus
parameter.
The relay decrypts and echos back all MEAS_ECHO cells it receives on
measurement connections until it has reported its amount of background
traffic the same number of times as there are seconds in the measurement
(e.g. 30 per-second reports for a 30 second measurement). After sending
the last MEAS_BG cell, the relay drops all buffered MEAS_ECHO cells,
closes all measurement connections, and exits measurement mode.
During the measurement the relay targets a ratio of background traffic
to measurement traffic as specified by a consensus parameter/torrc
option. For a given ratio r, if the relay has handled x cells of
measurement traffic recently, Tor then limits itself to y = xr/(1-r)
cells of non-measurement traffic this scheduling round.
If x is very small, the relay will perform the calculation s.t. x is the
number of cells required to produce 10 Mbit/s of measurement traffic, thus
ensuring some minimum amount of background traffic is allowed.
[XXX teor suggests in [4] that the number 10 Mbit/s could be derived more
intelligently. E.g. based on AuthDirFastGuarantee or
AuthDirGuardBWGuarantee]
## 3.2 FlashFlow Components
The FF coordinator and measurer code will reside in a FlashFlow
repository separate from little-t tor.
There are three notable parameters for which a FF deployment must choose
values. They are:
- The number of sockets, s, the measurers should open, in aggregate,
with the target relay. We suggest s=160 based on the FF paper.
- The bandwidth multiplier, m. Given an existing capacity estimate for
a relay, z, the coordinator will instruct the measurers to, in
aggregate, send m*z Mbit/s to the target relay. We recommend m=2.25.
- The measurement duration, d. Based on the FF paper, we recommend
d=30 seconds.
The rest of this section first discusses notable functions of the
FlashFlow coordinator, then goes on to discuss FF measurer code that
will require supporting tor code.
### 3.2.1 FlashFlow Coordinator
The coordinator is responsible for scheduling measurements, aggregating
results, and producing v3bw files. It needs continuous access to new
consensus files, which it can obtain by running an accompanying Tor
process in client mode.
The coordinator has the following functions, which will be described in
this section:
- result aggregation.
- schedule measurements.
- v3bw file generation.
#### 3.2.1.1 Aggregating Results
Every second during a measurement, the measurers send the amount of
verified measurement traffic they have received back from the relay.
Additionally, the relay sends a MEAS_BG cell each second to the
coordinator with amount of non-measurement background traffic it is
sending and receiving.
For each second's reports, the coordinator sums the measurer's reports.
The coordinator takes the minimum of the relay's reported sent and
received background traffic. If, when compared to the measurer's reports
for this second, the relay's claimed background traffic is more than
what's allowed by the background/measurement traffic ratio, then the
coordinator further clamps the relay's report down. The coordinator adds
this final adjusted amount of background traffic to the sum of the
measurer's reports.
Once the coordinator has done the above for each second in the
measurement (e.g. 30 times for a 30 second measurement), the coordinator
takes the median of the 30 per-second throughputs and records it as the
estimated capacity of the target relay.
#### 3.2.1.2 Measurement Schedule
The short term implementation of measurement scheduling will be simpler
than the long term one due to (1) there only being one FlashFlow
deployment, and (2) there being very few relays that support being
measured by FlashFlow. In fact the FF coordinator will maintain a list
of the relays that have updated to support being measured and have opted
in to being measured, and it will only measure them.
The coordinator divides time into a series of 24 hour periods, commonly
referred to as days. Each period has measurement slots that are longer
than a measurement lasts (30s), say 60s, to account for pre- and
post-measurement work. Thus with 60s slots there's 1,440 slots in a
day.
At the start of each day the coordinator considers the list of relays
that have opted in to being measured. From this list of relays, it
repeatedly takes the relay with the largest existing capacity estimate.
It selects a random slot. If the slot has existing relays assigned to
it, the coordinator makes sure there is enough additional measurer
capacity to handle this relay. If so, it assigns this relay to this
slot. If not, it keeps picking new random slots until one has sufficient
additional measurer capacity.
Relays without existing capacity estimates are assumed to have the 75th
percentile capacity of the current network.
If a relay is not online when it's scheduled to be measured, it doesn't
get measured that day.
##### 3.2.1.2.1 Example
Assume the FF deployment has 1 Gbit/s of measurer capacity. Assume the
chosen multiplier m=2. Assume there are only 5 slots in a measurement
period.
Consider a set of relays with the following existing capacity estimates
and that have opted in to being measured by FlashFlow.
- 500 Mbit/s
- 300 Mbit/s
- 250 Mbit/s
- 200 Mbit/s
- 100 Mbit/s
- 50 Mbit/s
The coordinator takes the largest relay, 500 Mbit/s, and picks a random
slot for it. It picks slot 3. The coordinator takes the next largest,
300, and randomly picks slot 2. The slots are now:
```
0 | 1 | 2 | 3 | 4
-------|-------|-------|-------|-------
| | 300 | 500 |
| | | |
```
The coordinator takes the next largest, 250, and randomly picks slot 2.
Slot 2 already has 600 Mbit/s of measurer capacity reserved (300*m);
given just 1000 Mbit/s of total measurer capacity, there is just 400
Mbit/s of spare capacity while this relay requires 500 Mbit/s. There is
not enough room in slot 2 for this relay. The coordinator picks a new
random slot, 0.
```
0 | 1 | 2 | 3 | 4
-------|-------|-------|-------|-------
250 | | 300 | 500 |
| | | |
```
The next largest is 200 and the coordinator randomly picks slot 2 again
(wow!). As there is just enough spare capacity, the coordinator assigns
this relay to slot 2.
```
0 | 1 | 2 | 3 | 4
-------|-------|-------|-------|-------
250 | | 300 | 500 |
| | 200 | |
```
The coordinator randomly picks slot 4 for the last remaining relays, in
that order.
```
0 | 1 | 2 | 3 | 4
-------|-------|-------|-------|-------
250 | | 300 | 500 | 100
| | 200 | | 50
```
#### 3.2.1.3 Generating V3BW files
Every hour the FF coordinator produces a v3bw file in which it stores
the latest capacity estimate for every relay it has measured in the last
week. The coordinator will create this file on the host's local file
system. Previously-generated v3bw files will not be deleted by the
coordinator. A symbolic link at a static path will always point to the
latest v3bw file.
```
$ ls -l
v3bw -> v3bw.2020-03-01-05-00-00
v3bw.2020-03-01-00-00-00
v3bw.2020-03-01-01-00-00
v3bw.2020-03-01-02-00-00
v3bw.2020-03-01-03-00-00
v3bw.2020-03-01-04-00-00
v3bw.2020-03-01-05-00-00
```
[XXX Either FF should auto-delete old ones, logrotate config should be
provided, a script provided, or something to help bwauths not accidentally
fill up their disk]
[XXX What's the approxmiate disk usage for, say, a few years of these?]
### 3.2.2 FlashFlow Measurer
The measurers take commands from the coordinator, connect to target
relays with many sockets, send them traffic, and verify the received
traffic is the same as what was sent.
Notable new things that internal tor code will need to do on the
measurer (client) side:
1. Open many TLS+TCP connections to the same relay on purpose.
#### 3.2.2.1 Open many connections
FlashFlow prototypes needed to "hack in" a flag in the
open-a-connection-with-this-relay function call chain that indicated
whether or not we wanted to force a new connection to be created. Most
of Tor doesn't care if it reuses an existing connection, but FF does
want to create many different connections. The cleanest way to
accomplish this will be investigated.
On the relay side, these measurer connections do not count towards DoS
detection algorithms.
## 3.3 Security
In this section we discuss the security of various aspects of FlashFlow
and the tor changes it requires.
### 3.3.1 Weight Inflation
Target relays are an active part of the measurement process; they know
they are getting measured. While a relay cannot fake the measurement
traffic, it can trivially stop transferring client background traffic
for the duration of the measurement yet claim it carried some. More
generally, there is no verification of the claimed amount of background
traffic during the measurement. The relay can claim whatever it wants,
but it will not be trusted above the ratio the FlashFlow deployment is
configured to know. This places an easy to understand, firm, and (if set
as we suggest) low cap on how much a relay can inflate its measured
capacity.
Consider a background/measurement ratio of 1/4, or 25%. Assume the relay
in question has a hard limit on capacity (e.g. from its NIC) of 100
Mbit/s. The relay is supposed to use up to 25% of its capacity for
background traffic and the remaining 75%+ capacity for measurement
traffic. Instead the relay ceases carrying background traffic, uses all
100 Mbit/s of capacity to handle measurement traffic, and reports ~33
Mbit/s of background traffic (33/133 = ~25%). FlashFlow would trust this
and consider the relay capable of 133 Mbit/s. (If the relay were to
report more than ~33 Mbit/s, FlashFlow limits it to just ~33 Mbit/s.)
With r=25%, FlashFlow only allows 1.33x weight inflation.
Prior work shows that Torflow allows weight inflation by a factor of 89x
[0] or even 177x [1].
The ratio chosen is a trade-off between impact on background traffic and
security: r=50% allows a relay to double its weight but won't impact
client traffic for relays with steady state throughput below 50%, while
r=10% allows a very low inflation factor but will cause throttling of
client traffic at far more relays. We suggest r=25% (and thus
1/(1-0.25)=1.33x inflation) for a reasonable trade-off between
performance and security.
It may be possible to catch relays performing this attack, especially if
they literally drop all background traffic during the measurement: have
the measurer (or some party on its behalf) create a regular stream
through the relay and measure the throughput on the stream
before/during/after the measurement. This can be explored longer term.
### 3.3.2 Incomplete Authentication
The short term FlashFlow implementation has the relay set two torrc
options if they would like to allow themselves to be measured: a flag
allowing measurement, and the list of coordinator TLS certificate that
are allowed to start a measurement.
The relay drops MEAS_PARAMS cells from coordinators it does not trust,
and immediately closes the connection after that. A FF coordinator
cannot convince a relay to enter measurement mode unless the relay
trusts its TLS certificate.
A trusted coordinator specifies in the MEAS_PARAMS cell the IP addresses
of the measurers the relay shall expect to connect to it shortly. The
target adds the measurer IP addresses to a whitelist in the DoS
connection limit system, exempting them from any configured connection
limit. If a measurer is behind a NAT, an adversary behind the same NAT
can DoS the relay's available sockets until the end of the measurement.
The adversary could also pretend to be the measurer. Such an adversary
could induce measurement failures and inaccuracies. (Note: the whitelist
is cleared after the measurement is over.)
# 4. FlashFlow measurement system: Medium term
The medium term deployment stage begins after FlashFlow has been
implemented and relays are starting to update to a version of Tor that
supports it.
New link- and relay-subprotocol versions will be used by the relay to
indicate
FF support. E.g. at the time of writing, the next relay subprotocol
version is
4 [3].
We plan to host a FlashFlow deployment consisting of a FF coordinator
and a single FF measurer on a single 1 Gbit/s machine. Data produced by
this deployment will be made available (semi?) publicly, including both
v3bw files and intermediate results.
Any development changes needed during this time would go through
separate proposals.
# 5. FlashFlow measurement system: Long term
In the long term, finishing-touch development work will be done,
including adding better authentication and measurement scheduling, and
experiments will be run to determine the best way to integrate FlashFlow
into the Tor ecosystem.
Any development changes needed during this time would go through
separate proposals.
## 5.1 Authentication to Target Relay
Short term deployment already had FlashFlow coordinators using TLS
certificates when connecting to relays, but in the long term, directory
authorities will vote on the consensus parameter for which coordinators
should be allowed to perform measurements. The voting is done in the
same way they currently vote on recommended tor versions.
FlashFlow measurers will be updated to use TLS certificates when
connecting to relays too. FlashFlow coordinators will update the
contents of MEAS_PARAMS cells to contain measurer TLS certificates
instead of IP addresses, and relays will update to expect this change.
## 5.2 Measurement Scheduling
Short term deployment only has one FF deployment running. Long term this
may no longer be the case because, for example, more than one directory
authority decides to adopt it and they each want to run their own
deployment. FF deployments will need to coordinate between themselves
to not measure the same relay at the same time, and to handle new relays
as they join during the middle of a measurement period (during the day).
The measurement scheduling process shall be non-interactive. All the inputs
(e.g. the shared random value, the identities of the coords, the relays
currently in the network) are publicly known to (at least) the bwauths, thus
each individual bwauth can calculate same multi-coord measurement schedule.
The following is quoted from Section 4.3 of the FlashFlow paper.
To measure all relays in the network, the BWAuths periodically
determine the measurement schedule. The schedule determines when and
by whom a relay should be measured. We assume that the BWAuths have
sufficiently synchronized clocks to facilitate coordinating their
schedules. A measurement schedule is created for each measurement
period, the length p of which determines how often a relay is
measured. We use a measurement period of p = 24 hours.
To help avoid active denial-of-service attacks on targeted relays,
the measurement schedule is randomized and known only to the
BWAuths. Before the next measurement period starts, the BWAuths
collectively generate a random seed (e.g. using Tor’s
secure-randomness protocol). Each BWAuth can then locally determine
the shared schedule using pseudorandom bits extracted from that
seed. The algorithm to create the schedule considers each
measurement period to be divided into a sequence of t-second
measurement slots. For each old relay, slots for each BWAuth to
measure it are selected uniformly at random without replacement
from all slots in the period that have sufficient unallocated
measurement capacity to accommodate the measurement. When a new
relay appears, it is measured separately by each BWAuth in the first
slots with sufficient unallocated capacity. Note that this design
ensures that old relays will continue to be measured, with new
relays given secondary priority in the order they arrive.
[XXX Teor leaves good ideas in his tor-dev@ post [5],
including a good plain language description of what the FF paper quotes
says,
and a recommendation on which consensus to use when making a new schedule]
A problem arises when two relays are hosted on the same machine but measured
at different times: they both will be measured to have the full capacity of
their host. At the very least, the scheduling algo should schedule
relays with
the same IP to be measured at the same time. Perhaps better is measuring all
relays in the same MyFamily, same ipv4/24, and/or same ipv6/48 at the same
time. What specifically to do here is left for medium/long term work.
## 5.3 Experiments
[XXX todo]
## 5.4 Other Changes/Investigations/Ideas
- How can FlashFlow data be used in a way that doesn't lead to poor load
balancing given the following items that lead to non-uniform client
behavior:
- Guards that high-traffic HSs choose (for 3 months at a time)
- Guard vs middle flag allocation issues
- New Guard nodes (Guardfraction)
- Exit policies other than default/all
- Directory activity
- Total onion service activity
- Super long-lived circuits
- Add a cell that the target relay sends to the coordinator indicating
its CPU and memory usage, whether it has a shortage of sockets, how
much bandwidth load it has been experiencing lately, etc. Use this
information to lower a relays weight, never increase.
- If FlashFlow and sbws work together (as opposed to FlashFlow replacing
sbws), consider logic for how much sbws can increase/decrease FF
results
- Coordination of multiple FlashFlow deployments: scheduling of
measurements, seeding schedule with shared random value.
- Other background/measurement traffic ratios. Dynamic? (known slow
relay => more allowed bg traffic?)
- Catching relays inflating their measured capacity by dropping
background traffic.
- What to do about co-located relays. Can they be detected reliably?
Should we just add a torrc option a la MyFamily for co-located relays?
- What is the explanation for dennis.jackson's scary graphs in this [2]
ticket? Was it because of the speed test? Why? Will FlashFlow produce
the same behavior?
# Citations
[0] F. Thill. Hidden Service Tracking Detection and Bandwidth Cheating
in Tor Anonymity Network. Master’s thesis, Univ. Luxembourg, 2014.
https://www.cryptolux.org/images/b/bc/Tor_Issues_Thesis_Thill_Fabrice.pdf
[1] A. Johnson, R. Jansen, N. Hopper, A. Segal, and P. Syverson.
PeerFlow: Secure Load Balancing in Tor. Proceedings on Privacy
Enhancing Technologies (PoPETs), 2017(2), April 2017.
https://ohmygodel.com/publications/peerflow-popets2017.pdf
[2] Mike Perry: Graph onionperf and consensus information from Rob's
experiments
https://trac.torproject.org/projects/tor/ticket/33076
[3] tor-spec.txt Section 9.3 "Relay" Subprotocol versioning
https://gitweb.torproject.org/torspec.git/tree/tor-spec.txt#n2132
[4] Teor's second respose to FlashFlow proposal
https://lists.torproject.org/pipermail/tor-dev/2020-April/014251.html
[5] Teor's first respose to FlashFlow proposal
https://lists.torproject.org/pipermail/tor-dev/2020-April/014246.html
# Appendix A: Save CPU at measurer by not encrypting all MEAS_ECHO cells
## Verify echo cells
A parameter will exist to tell the measurers with what frequency they
shall verify that cells echoed back to them match what was sent. This
parameter does not need to exist outside of the FF deployment (e.g. it
doesn't need to be a consensus parameter).
The parameter instructs the measurers to check 1 out of every N cells.
The measurer keeps a count of how many measurement cells it has sent. It
also logically splits its output stream of cells into buckets of size N.
At the start of each bucket (when num_sent % N == 0), the measurer
chooses a random index in the bucket. Upon sending the cell at that
index (num_sent % N == chosen_index), the measurer records the cell.
The measurer also counts cells that it receives. When it receives a cell
at an index that was recorded, it verifies that the received cell
matches the recorded sent cell. If they match, no special action is
taken. If they don't match, the measurer indicates failure to the
coordinator and target relay and closes all connections, ending the
measurement.
### Example
Consider bucket_size is 1000. For the moment ignore cell encryption.
We start at idx=0 and pick an idx in [0, 1000) to record, say 640. At
idx=640 we record the cell. At idx=1000 we choose a new idx in [1000,
2000) to record, say 1236. At idx=1236 we record the cell. At idx=2000
we choose a new idx in [2000, 3000). Etc.
There's 2000+ cells in flight and the measurer has recorded two items:
```
- (640, contents_of_cellA)
- (1236, contents_of_cellB)
```
Consider the receive side now. It counts the cells it receives. At
receive idx=640, it checks the received cell matches the saved cell from
before. At receive idx=1236, it again checks the received cell matches.
Etc.
### Motivation
A malicious relay may want to skip decryption of measurement cells to
save CPU cycles and obtain a higher capacity estimate. More generally,
it could generate fake measurement cells locally, ignore the measurement
traffic it is receiving, and flood the measurer with more traffic that
it (the measurer) is even sending.
The security of echo cell verification is discussed in section 3.3.1.
### Security
A smaller bucket size means more cells are checked and FF is more likely
to detect a malicious target. It also means more bookkeeping overhead
(CPU/RAM).
An adversary that knows bucket_size and cheats on one item out of every
bucket_size items will have a 1/bucket_size chance of getting caught in
the first bucket. This is the worst case adversary. While cheating on
just a single item per bucket yields very little advantage, cheating on
more items per bucket increases the likelihood the adversary gets
caught. Thus only the worst case is considered here.
In general, the odds the adversary can successfully cheat in a single
bucket are
```
(bucket_size-1)/bucket_size
```
Thus the odds the adversary can cheat in X consecutive buckets are
```
[(bucket_size-1)/bucket_size]^X
```
In our case, X will be highly varied: Slow relays won't see very many
buckets, but fast relays will. The damage to the network a very slow
relay can do by faking being only slightly faster is limited.
Nonetheless, for now we motivate the selection of bucket_size with a
slow relay:
- Assume a very slow relay of 1 Mbit/s capacity that will cheat 1 cell
in each bucket. Assume a 30 second measurement.
- The relay will handle 1*30 = 30 Mbit of traffic during the
measurement, or 3.75 MB, or 3.75 million bytes.
- Cells are 514 bytes. Approximately (e.g. ignoring TLS) 7300 cells
will be sent/recv over the course of the measurement.
- A bucket_size of 50 results in about 146 buckets over the course of
the 30s measurement.
- Therefore, the odds of the adversary cheating successfully as
(49/50)^(146), or about 5.2%.
This sounds high, but a relay capable of double the bandwidth (2 Mbit/s)
will have (49/50)^(2*146) or 0.2% odds of success, which is quite low.
Wanting a <1% chance that a 10 Mbit/s relay can successfully cheat
results in a bucket size of approximately 125:
- 10*30 = 300 Mbit of traffic during 30s measurement. 37.5 million
bytes.
- 37,500,000 bytes / 514 bytes/cell = ~73,000 cells
- bucket_size of 125 cells means 73,000 / 125 = 584 buckets
- (124/125)^(584) = 0.918% chance of successfully cheating
Slower relays can cheat more easily but the amount of extra weight they
can obtain is insignificant in absolute terms. Faster relays are
essentially unable to cheat.
Hi all,
I was asked by George to submit my comments about the proposal and to suggest
suitable RandomX parameters for this PoW scheme.
> For our dynamic PoW system to work, we will need to be able to compare PoW
> tokens with each other. To do so we define a function:
> unsigned effort(uint8_t *token)
> which takes as its argument a hash output token, and returns the number of
> leading zero bits on it.
This definition makes the effort exponential, i.e. the computational resources
required to reach one notch higher effort increase by a factor of 2 each time.
I suggest to use linear effort defined as the quotient of dividing a bitstring
of 1s by the hash value:
== Example A:
effort(00000001100010101101) = 11111111111111111111 / 00000001100010101101
or in decimal:
effort(6317) = 1048575 / 6317 = 165.
This definition of effort has the advantage of directly expressing the expected
number of hashes that the client had to calculate to reach the effort.
With the exponential definition, we could have an equivalent linear effort of
either 128 (7 leading zeroes) or 256 (8 leading zeroes), while the linear
definition provides smoother classification of PoW results.
> The EXT_FIELD content format is:
>
> POW_VERSION [1 byte]
> POW_NONCE [32 bytes]
>
I suggest to use a 16-byte nonce value, which is more than sufficient given the
target security level and has the benefit of reducing the replay cache size to
half.
Since we expect the seed to be valid for around 3 hours (as proposed), then even
if the service receives 1 million proofs per second and each proof has an effort
of 1 million, then the number of submitted nonces from clients will only reach
about 10^10. With a 108-bit solution space (subtracting 20 bits as the search
space per client), the probability that two clients accidentally submit the same
nonce is roughly 10^-13 (see [REF_BIRTHDAY]).
Additionally, I suggest to add the client's effort for convenience:
The updated EXT_FIELD content format would be:
POW_VERSION [1 byte]
POW_EFFORT [4 bytes]
POW_NONCE [16 bytes]
Including the effort has 2 benefits:
1. In case the Intro Priority Queue is full, the service doesn't need to
waste time verifying PoW solutions that have effort lower than the last
element in the queue. While an attacker can always lie about the actual
effort of their nonce value, I think this field can still save some CPU
cycles when the service is under heavy load.
2. The service can conveniently verify the reported effort with
the following inequality:
POW_EFFORT * pow_function(POW_NONCE, SEED) <= MAX_RESULT
where MAX_RESULT is the highest possible output from the pow_function.
In the case of Example A, that would be:
165 * 6317 = 1042305 <= 1048576
> Similar to how our cell scheduler works, the onion service subsystem will
> poll the priority queue every 100ms tick and process the first 20 cells from
> the priority queue (if they exist). The service will perform the rendezvous
> and the rest of the onion service protocol as normal.
I suggest to use a different selection method rather than always taking
the first 20 requests.
Selecting cells from the front of the queue actually minimizes the effort that
an attacker needs to expend to completely block access to the service.
The attacker can always submit the minimum effort required to occupy the first
20 slots in each 100 ms window. This can be done by submitting just 200 requests
per second and observing how many circuits are successfully open and adjusting
the effort until no other request can go through. This is the "Total overwhelm
strategy" described in § 5.1.1.
See the following examples to show how selecting the first N cells from
the queue is unfair. I will use N = 4 for clarity. E denotes some value of
effort.
== Example B:
ATTACKER LEGITIMATE CLIENTS
___________ _________ ____________ ______________
/ \ / \
+--------+--------+--------+--------+----+----+----+----+----+----+----+----+
| 2E | 2E | 2E | 2E | E | E | E | E | E | E | E | E |
+--------+--------+--------+--------+----+----+----+----+----+----+----+----+
^ ^ ^ ^
selected selected selected selected
Here both the attacker and the legitimate clients have expended a combined
effort of 8E. All of the attacker's cells get selected, while none of the other
clients get through.
Instead, I suggest to use Stochastic universal sampling [REF_SUS], where
the probability that a cell gets selected is proportional to its effort.
== Example C:
Here the total effort in the queue is 16E, so we first select a random value in
the interval [0, 4E) and then select 4 evenly spaced cells:
ATTACKER LEGITIMATE CLIENTS
___________ _________ ____________ ______________
/ \ / \
+--------+--------+--------+--------+----+----+----+----+----+----+----+----+
| 2E | 2E | 2E | 2E | E | E | E | E | E | E | E | E |
+--------+--------+--------+--------+----+----+----+----+----+----+----+----+
^ ^ ^ ^
selected selected selected selected
In this case, 2 cells are selected from each group, which is fairer considering
that each group expended the same effort.
== Example D:
Now if the attacker wanted to block access to all legitimate clients, they would
need to at least quadruple their total PoW effort (and there would still be
a chance that a legitimate client gets selected from the end of the queue):
ATTACKER LEGITIMATE CLIENTS
__________________________ ______________________ ______________
/ \ / \
+--------------+--------------+---------------+--------------+-+-+-+-+-+-+-+-+
| 8E | 8E | 8E | 8E |E|E|E|E|E|E|E|E|
+--------------+--------------+--------------+---------------+-+-+-+-+-+-+-+-+
^ ^ ^ ^
selected selected selected selected
Note: With linear effort, the original selection method becomes even worse
because the attacker can occupy the front of the queue with an effort of just
E+1 per cell rather than 2E.
In some cases, Stochastic universal sampling can select a single element
multiple times, which is not a problem for genetic algorithms, but we want to
avoid it. I suggest to restart the selection algorithm from the beginning of
the next cell in those cases and shorten the sampling interval according to
the remaining portion of the queue. See the following example.
== Example E:
+---------------+-------------+--------------+------------+-+-+-+-+-+-+-+-+-+-+
| 16E | 8E | 6E |E|E|E|E|E|E|E|E|E|E|
+---------------+-------------+--------------+------------+-+-+-+-+-+-+-+-+-+-+
^ ^ ^ ^
selected selected selected selected
> In particular, the service starts with a default suggested-effort value of 15.
>
> Everytime the service handles an introduction request from the priority queue
> in [HANDLE_QUEUE], the service compares the request's effort to the current
> suggested-effort value. If the new request's effort is lower than the
> suggested-effort, set the suggested-effort equal to the effort of the new
> request.
>
> Everytime the service trims the priority queue in [HANDLE_QUEUE], the service
> compares the request at the trim point against the current suggested-effort
> value. If the trimmed request's effort is higher than the suggested-effort,
> set the suggested-effort equal to the effort of the new request.
I think the default suggested-effort is a bit too high. Assuming each attempt
takes around 1 ms to calculate, the client would need, on average,
2^15 ms = 33 seconds to find a solution using 1 CPU core. I suggest to specify
a minimum effort instead. Requests with effort < MIN_EFFORT and requests without
the PROOF_OF_WORK extension would be treated as having effort = 1 for
the purposes of the sampling algorithm.
Secondly, the proposed method of calculating the suggested-effort is susceptible
to gaming by attackers. Since the service can only update the value in
the directory once every 'hs-pow-desc-upload-rate-limit' seconds, they could
stop the attack for a little while just before the directory gets updated, which
would cause the suggested-effort value to be too low despite an ongoing attack.
I suggest to take the median effort of the selected cells during each 100 ms
window. For the examples above that would be:
Example C: 1.5E
Example D: 8E
Example E: 7E
Then I would take the median of these values over the directory update period.
>
> 5. Attacker strategies
>
Additional attacks:
5.1.2 PoW Spam Attack
The attacker may try to spam many requests with random values of POW_NONCE,
requiring the service to waste cycles verifying the invalid proofs. No such
request would make it into the Intro Queue, but it may still be a viable DoS
strategy depending on proof verification time and the number of intro requests
that can be practically delivered through the network.
5.1.3 Precomputed PoW attack
The attacker may precompute many valid PoW nonces and submit them all at once
before the current seed expires, overwhelming the service temporarily even
using a single computer.
> 4.2. Seed expiration issues
>
> As mentioned in [DESC_POW], the expiration timestamp on the PoW seed can
> cause issues with clock skewed clients. Furthermore, even not clock skewed
> clients can encounter TOCTOU-style race conditions here.
>
> The client descriptor refetch logic of [CLIENT_TIMEOUT] should take care of
> such seed-expiration issues, since the client will refetch the descriptor.
>
I suggest to use 2 concurrent seeds, i.e. to accept PoW both from the current
and the last seed epoch. We use this approach in Monero. It would however
require adding the seed value into the proof of work extension field and also
double the memory requirements for verification with RandomX.
> The proposal suggests argon2, and Mike has been looking at Randomx. However,
> after further consideration and speaking with some people (props to Alex
> Biryukov), it seems like those two functions are not well fitted for this
> purpose, since they are memory-hard both for the client and the service. And
> since we are trying to minimize the verification overhead, so that the
> service can do hundreds of verifications per second, they don't seem like
> good fits.
Asymmetric function like Equihash, Cuckoo cycle [REF_CUCKOO] and MTP have the
advantage of being very fast to verify, but they run much faster on GPUs and
specialized hardware, so I don't think they are particularly suitable for this
purpose.
When we designed RandomX to be used as the PoW algorithm by Monero, we selected
the parameters very conservatively to maximize the ASIC resistance of
the algorithm. That's because it is very difficult to change the PoW algorithm
once it's deployed.
TOR is not limited by this, so we can be much more aggressive when configuring
RandomX. I suggest to use a configuration that gives the fastest possible
verification time without completely breaking the algorithm.
In particular, the following parameters should be set differently from Monero:
RANDOMX_ARGON_SALT = "RandomX-TOR-v1"
The unique RandomX salt means we do not need to use a separate salt as PoW input
as specified in § 3.2.
RANDOMX_ARGON_ITERATIONS = 1
RANDOMX_CACHE_ACCESSES = 4
RANDOMX_DATASET_BASE_SIZE = 1073741824
RANDOMX_DATASET_EXTRA_SIZE = 16777216
These 4 changes reduce the RandomX Dataset size to ~1 GiB, which allows
the number of iteration to be reduced from 8 to 4. The combined effect of this
is that Dataset initialization becomes 4 times faster, which is needed due to
more frequent updates of the seed (Monero updates once per ~3 days).
RANDOMX_PROGRAM_COUNT = 2
RANDOMX_SCRATCHPAD_L3 = 1048576
Additionally, reducing the number of programs from 8 to 2 makes the hash
calculation about 4 times faster, while still providing resistance against
program filtering strategies (see [REF_RANDOMX_PROGRAMS]). Since there are
4 times fewer writes, we also have to reduce the scratchpad size. I suggest to
use a 1 MiB scratchpad size as a compromise between scratchpad write density and
memory hardness. Most x86 CPUs will perform roughly the same with a 512 KiB and
1024 KiB scratchpad, while the larger size provides higher resistance against
specialized hardware, at the cost of possible time-memory tradeoffs (see
[REF_RANDOMX_TMTO] for details).
Lastly, we reduce the output of RandomX to just 8 bytes:
RANDOMX_HASH_SIZE = 8
64-bit preimage security is more than sufficient for proof-of-work and it allows
the result to be treated as a little-endian encoded unsigned integer for easy
effort calculation.
RandomX would be used as follows:
The service will select a 32-byte POW_SEED and initialize the cache and
the dataset:
randomx_init_cache(myCache, POW_SEED, 32);
randomx_init_dataset(myDataset, myCache, 0, randomx_dataset_item_count());
randomx_vm *myMachine = randomx_create_vm(flags, NULL, myDataset);
Then in order to validate a PoW token, we could use something like this:
int validateProof(uint32_t pow_effort, void* pow_nonce) {
uint64_t result;
randomx_calculate_hash(myMachine, pow_nonce, 16, &result);
if (mulh(pow_effort, result) == 0) {
return 1;
}
return 0;
}
I suggest to set MIN_EFFORT = 10000, which takes about 1 second on my laptop.
Requests with pow_effort < MIN_EFFORT would not be validated.
I have collected some performance figures for the fast mode with the above
RandomX configuration (~1 GiB of memory is required):
H/s = hashes per second
== CPUs:
Intel Core i3-3220
- 1 thread 700 H/s
- 3 threads 1400 H/s
Intel Xeon (dual core VPS, Sandy Bridge, unknown model)
- 1 thread 2000 H/s
- 2 threads 4000 H/s
Intel Core i5-2500K (stock)
- 1 thread 2200 H/s
- 4 threads 8200 H/s
Intel Core i7-8550U (laptop)
- 1 thread 2700 H/s
- 8 threads 10000 H/s
Intel Core i7-9850H (laptop)
- 1 thread 3100 H/s
- 12 threads 16000 H/s
AMD Ryzen 1700 @ 3300MHz
- 1 thread 2300 H/s
- 16 threads 23800 H/s
AMD Ryzen 3700X @ 3300MHz
- 1 thread 2500 H/s
- 16 threads 27500 H/s
AMD Epyc 7702P
- 1 thread 2100 H/s
- 128 threads 139000 H/s
== GPUs:
NVIDIA GeForce GTX 1660 Ti (credits to SChernykh, see [REF_RANDOMX_CUDA])
- 3072 intensity 2600 H/s
According to the above results, the time to verify a single hash is around
400-500 μs. A mid-range GPU has a similar performance as a single CPU core.
Most CPUs made since 2011 have similar per-core performance except of low-end
CPUs without hardware AES support.
References:
[REF_BIRTHDAY]: https://en.wikipedia.org/wiki/Birthday_attack#Mathematics
[REF_SUS]: https://en.wikipedia.org/wiki/Stochastic_universal_sampling
[REF_CUCKOO]: https://github.com/tromp/cuckoo
[REF_RANDOMX_PROGRAMS]:
https://github.com/tevador/RandomX/blob/master/doc/design.md#12-the-easy-pr…
[REF_RANDOMX_TMTO]: https://github.com/tevador/RandomX/issues/65
[REF_RANDOMX_CUDA]: https://github.com/SChernykh/RandomX_CUDA
Hi everyone,
The end of the Google Summer of Code period has arrived, and you can
find my GSoC final report for the CAPTCHA Monitoring project here [1].
This was my first time working with an active open source community
and I enjoyed it a lot! The feedback you have provided made the
experience worthwhile and exciting. I couldn't finish implementing all
of the features I planned, so I will be around and I plan to stay
active. I want to thank my mentors GeKo & arma and everyone who helped
me with the project!
Best,
Barkin
[1] https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/wikis/GSoC-2020