Hi,
Thanks for writing this draft spec.
Please see my suggested changes below:
On 17 Apr 2018, at 21:23, juga juga@riseup.net wrote:
Hi,
as commented with teor and pastly, i send in-line a draft specification for the document format that the bandwidth scanner implementations should produce.
I've left my own questions/notes in square brackets.
Thanks, juga.
=======================================
Tor Bandwidth Measurements Document Format [juga: which name should we give to this document?]
That's a fine name. You can leave out the "Document" if you want.
- Scope and preliminaries
This document describes the format of Tor's bandwidth measurements document, version X.X.X [juga: which version should be this?]
It doesn't matter, so let's use semantic versioning: * the original torflow format was 1.0.0 * the format in this spec adds the "version" feature, so it is 1.1.0 (it is compatible with 1.0.0, as long as parsers ignore unrecognised lines)
and later.
Since Tor version X.X.X [juga: which tor version?]
It looks like 0.2.4.12-alpha added measured bandwidths https://gitweb.torproject.org/tor.git/tree/ChangeLog#n12710
the directory authorities use the bandwidth measurements document called "V3BandwidthsFile" and produced by Torflow [1] (format described in README.spec.txt [2]).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
1.2. Acknowledgements
The original bandwidth measurement scanner (Torflow) and format was created by mike. Teor suggested to write this specification while contributing on pastly's new bandwidth scanner implementation.
This specification was revised after feedback from:
XXX
1.3 Outline
The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2 of dir-spec.txt [3] are obtained by bandwidth authorities, which are either directory authorities or other servers running bandwidth measurement scanners and sending the results to the former. [juga: it seems that bandwidth authorities have not been formally before]
You could use the definition in the man page: "the bandwidth-authority generated file storing information on relays' measured bandwidth capacities"
- Format details
Bandwidth measurements MUST contain the following sections:
- Header (exactly once)
- Relays measurements (zero or more times)
Each section (or entry) ends with a separator.
This line is a copy-paste error, it should be deleted.
2.1. Nonterminals
The following nonterminals are defined in the Onionoo details document specification [4]:
fingerprint nickname
This file format gets the fingerprint and nickname from the consensus, so you should reference dir-spec.txt.
(dir-list-spec.txt gets relay fingerprints and nicknames from Onionoo. That's why it uses the Onionoo definitions.)
Here are the definitions of hexdigest (fingerprint) and nickname: https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n1268
In the bandwidth measurement documents nickname is optional.
"optional" is not relevant in a definition. Let's delete this line, it's already documented as optional later on.
The following nonterminals are defined in the in dir-spec.txt:
NL (newline) SP (space)
"bw" = INT, the aggregated measured bandwidth of this relay, in kilobytes per second.
bw is not defined in dir-spec.txt. And the formatting is confusing. Double quotes are used for ASCII literal strings in dir-spec.txt. Can you please follow the format used in dir-spec.txt?
Here is one example: https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n210
Here's how you can define bw using the Int definition from dir-spec.txt: https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n795
bw = Int
bw is the aggregated measured bandwidth of this relay, in kilobytes per second.
We introduce the following nonterminals: [juga: this should probably be defined more formally and should probably link to other documents, which ones?]
dir-spec.txt
"version" = The name and the version of the bandwidth scannner software, such as "sbws 0.1.0".
Our newest spec uses "version" for the file format version: https://gitweb.torproject.org/torspec.git/tree/dir-list-spec.txt#n148
So please don't make a field with a different meaning and structure, and call it "version".
I suggest: * use "version" for the file format version (or don't use "version") * use "source" for the implementation software name and version
Please fix the formatting of this definition to be like dir-spec.txt. This definition has two arguments separated by spaces, the name, and the version.
The name of the software, if absent, is assumed to be "torflow". [juga: which should be the version if absent?]
"if absent" is not relevant in a definition. Let's move these lines to the header section.
The software version should be optional. Torflow does not have a version, so we cannot require a version.
"timestamp" = INT, the Unix Epoch time when the file was created.
Please fix the formatting of this definition to be like dir-spec.txt.
2.2. Header format
We should say if order matters. We should say how new items get added to the header. (We could say that parsers MUST ignore unrecognised lines.)
It MUST consists of:
"timestamp" timestamp NL "version" version NL
The sbws sample data has: 1523911758 version=0.1.0
The first line does not have a "timestamp" string literal. The second line has an equals sign. The second line is optional (see the torflow sample data).
Does Tor ignore the version line? If it does, we should document it.
2.3. Relay measurements format
You should say that order on a line doesn't matter, and relay order also doesn't matter.
Relays measurements MUST consist of the following items.
"node_id" fingerprint SP "bw" bandwidth SP
The format has equals signs, but this definition does not.
When there are no more items, the "bw" item ends with NL instead of SP.
It might be easier to say that each line allows extra arguments, and reference the dir-spec.txt definition: https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n261
But does each argument need an equals sign?
2.4. Optional extra items
Different implementations of the bandwidth measurements scanners MAY include other items per relay.
For instance, sbws includes:
"rtt" = INT, Round Trip Time (to obtain 1B)
This definition belongs in the definition section. Please fix the formatting of this definition to be like dir-spec.txt.
Every relay measurement in sbws consists of:
"node_id" fingerprint SP "bw" bandwidth SP
The format has equals signs, but this definition does not.
The fingerprints in the sample data have $ signs. Does Tor require them? Or are they optional? We should document it either way.
"nick=" nickname SP "rtt=" rtt SP "time=" timestamp NL
The equals signs are correct here.
Every relay measurement in Torflow consists of:
"node_id" fingerprint SP "bw" bandwidth SP
The format has equals signs, but this definition does not.
"nick=" nickname SP "measured_at=" slice timestamp NL
slice is not defined, just use "timestamp", then explain using the next line.
The "measured_at" does not correspond to the "time" in sbws.
Is it worth explaining the difference?
[juga: actually, if bwauths use "measured_at", then the code on them or sbws should be changed].
Tor does not contain the string "measured_at": https://github.com/torproject/tor/search?q=measured_at
For consistency, please remove "measured_at", or add "updated_at".
Torflow includes other items that are out of the scope of this document.
We should think about which torflow fields are worth documenting.
References:
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/R... 3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt 4. https://metrics.torproject.org/onionoo.html#details
A. Sample data
Maybe the sample data should contain more than one relay?
A.1. Torflow
1523911758 node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=392760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath
A.2. sbws
1523911758 version=0.1.0 node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=392760 nick=Test rtt=380 time=1523911725
T