Hi,

Thanks for writing this draft spec.

Please see my suggested changes below:

On 17 Apr 2018, at 21:23, juga <juga@riseup.net> wrote:

Hi,

as commented with teor and pastly, i send in-line a draft specification
for the document format that the bandwidth scanner implementations
should produce.

I've left my own questions/notes in square brackets.

Thanks,
juga.

=======================================

         Tor Bandwidth Measurements Document Format
         [juga: which name should we give to this document?]

That's a fine name.
You can leave out the "Document" if you want.

1. Scope and preliminaries

 This document describes the format of Tor's bandwidth measurements
 document, version X.X.X [juga: which version should be this?]

It doesn't matter, so let's use semantic versioning:
* the original torflow format was 1.0.0
* the format in this spec adds the "version" feature, so it is 1.1.0
  (it is compatible with 1.0.0, as long as parsers ignore unrecognised
  lines)

 and later.

 Since Tor version X.X.X [juga: which tor version?]

It looks like 0.2.4.12-alpha added measured bandwidths
https://gitweb.torproject.org/tor.git/tree/ChangeLog#n12710

the directory
 authorities use the bandwidth measurements document called
 "V3BandwidthsFile" and produced by Torflow [1]
 (format described in README.spec.txt [2]).

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
   NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   RFC 2119.

1.2. Acknowledgements

 The original bandwidth measurement scanner (Torflow) and format was
 created by mike. Teor suggested to write this specification while
 contributing on pastly's new bandwidth scanner implementation.

 This specification was revised after feedback from:

   XXX

1.3 Outline

 The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
 of dir-spec.txt [3] are obtained by bandwidth authorities, which are
 either directory authorities or other servers running bandwidth
 measurement scanners and sending the results to the former.
 [juga: it seems that bandwidth authorities have not been formally
 before]

You could use the definition in the man page:
"the bandwidth-authority generated file storing information on
relays' measured bandwidth capacities"

2. Format details

 Bandwidth measurements MUST contain the following sections:
 - Header (exactly once)
 - Relays measurements (zero or more times)

 Each section (or entry) ends with a separator.

This line is a copy-paste error, it should be deleted.

2.1. Nonterminals

 The following nonterminals are defined in the Onionoo details
 document specification [4]:

   fingerprint
   nickname

This file format gets the fingerprint and nickname from the
consensus, so you should reference dir-spec.txt.

(dir-list-spec.txt gets relay fingerprints and nicknames from
Onionoo. That's why it uses the Onionoo definitions.)

Here are the definitions of hexdigest (fingerprint) and nickname:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n1268

 In the bandwidth measurement documents nickname is optional.

"optional" is not relevant in a definition.
Let's delete this line, it's already documented as optional later on.

 The following nonterminals are defined in the in dir-spec.txt:

   NL      (newline)
   SP      (space)

   "bw" = INT, the aggregated measured bandwidth of this relay, in
   kilobytes per second.

bw is not defined in dir-spec.txt. And the formatting is confusing.
Double quotes are used for ASCII literal strings in dir-spec.txt.
Can you please follow the format used in dir-spec.txt?

Here is one example:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n210

Here's how you can define bw using the Int definition from
dir-spec.txt:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n795

bw = Int

bw is the aggregated measured bandwidth of this relay, in kilobytes
per second.

 We introduce the following nonterminals:
 [juga: this should probably be defined more formally and should
 probably link to other documents, which ones?]

dir-spec.txt

   "version" = The name and the version of the bandwidth scannner
   software, such as "sbws 0.1.0".

Our newest spec uses "version" for the file format version:
https://gitweb.torproject.org/torspec.git/tree/dir-list-spec.txt#n148

So please don't make a field with a different meaning and structure,
and call it "version".

I suggest:
* use "version" for the file format version (or don't use "version")
* use "source" for the implementation software name and version

Please fix the formatting of this definition to be like dir-spec.txt.
This definition has two arguments separated by spaces, the name,
and the version.

   The name of the software, if absent, is assumed to be "torflow".
   [juga: which should be the version if absent?]

"if absent" is not relevant in a definition.
Let's move these lines to the header section.

The software version should be optional.
Torflow does not have a version, so we cannot require a version.

   "timestamp" = INT, the Unix Epoch time when the file was created.

Please fix the formatting of this definition to be like dir-spec.txt.

2.2. Header format

We should say if order matters.
We should say how new items get added to the header.
(We could say that parsers MUST ignore unrecognised lines.)

 It MUST consists of:

   "timestamp" timestamp NL
   "version" version NL

The sbws sample data has:
1523911758
version=0.1.0

The first line does not have a "timestamp" string literal.
The second line has an equals sign.
The second line is optional (see the torflow sample data).

Does Tor ignore the version line?
If it does, we should document it.

2.3. Relay measurements format

You should say that order on a line doesn't matter, and relay order
also doesn't matter.

 Relays measurements MUST consist of the following items.

   "node_id" fingerprint SP
   "bw" bandwidth SP

The format has equals signs, but this definition does not.

 When there are no more items, the "bw" item ends with NL instead of
 SP.

It might be easier to say that each line allows extra arguments, and
reference the dir-spec.txt definition:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n261

But does each argument need an equals sign?

2.4. Optional extra items

 Different implementations of the bandwidth measurements scanners MAY
 include other items per relay.

 For instance, sbws includes:

   "rtt" = INT, Round Trip Time (to obtain 1B)

This definition belongs in the definition section.
Please fix the formatting of this definition to be like dir-spec.txt.

 Every relay measurement in sbws consists of:

   "node_id" fingerprint SP
   "bw" bandwidth SP

The format has equals signs, but this definition does not.

The fingerprints in the sample data have $ signs.
Does Tor require them? Or are they optional?
We should document it either way.

   "nick=" nickname SP
   "rtt=" rtt SP
   "time=" timestamp NL

The equals signs are correct here.

 Every relay measurement in Torflow consists of:

   "node_id" fingerprint SP
   "bw" bandwidth SP

The format has equals signs, but this definition does not.

   "nick=" nickname SP
   "measured_at=" slice timestamp NL

slice is not defined, just use "timestamp", then explain using
the next line.

 The "measured_at" does not correspond to the "time" in sbws.

Is it worth explaining the difference?

 [juga: actually, if bwauths use "measured_at", then the code on them
 or sbws should be changed].

Tor does not contain the string "measured_at":

For consistency, please remove "measured_at", or add "updated_at".

 Torflow includes other items that are out of the scope of this
 document.

We should think about which torflow fields are worth documenting.


Maybe the sample data should contain more than one relay?

A.1. Torflow

1523911758
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=392760 nick=Test
measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719
pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577
circ_fail=0.2 scanner=/filepath

A.2. sbws

1523911758
version=0.1.0
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=392760 nick=Test
rtt=380 time=1523911725

T