Hi,
after teor's revision, second version pasted below.
Changes can be seen: in https://github.com/juga0/torspec/commits/bandwidth-file-spec
Best, juga
=================================================================
Tor Bandwidth Measurements Document Format juga teor
1. Scope and preliminaries
This document describes the format of Tor's bandwidth measurements document, version 1.0.0 and later.
Since Tor version 0.2.4.12-alpha the directory authorities use the bandwidth measurements document called "V3BandwidthsFile" and produced by Torflow [1] (format described in README.spec.txt [2]).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
1.2. Acknowledgements
The original bandwidth measurement scanner (Torflow) and format was created by mike. Teor suggested to write this specification while contributing on pastly's new bandwidth scanner implementation.
This specification was revised after feedback from:
XXX
1.3 Outline
The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2 of "Tor directory protocol" (dir-spec.txt) [3] are obtained by bandwidth authorities, which generate a file storing information on relays' measured bandwidth capacities.
1.4. Format Versions
1.0.0 - The legacy fallback bandwidth measurements document format
1.1.0 - Adds key_value lines to the header, format version, optional ones and section separator.
2. Format details
Bandwidth measurements MUST contain the following sections: - Header (exactly once) - Relays measurements (zero or more times)
2.1. Definitions
The following nonterminals are defined in dir-spec.txt, sections 1.2., 2.1.1., 2.1.3.:
Int SP (space) NL (newline) Keyword ArgumentChar fingerprint (hexdigest) nickname
Nonterminals defined in "Tor Directory List Format" (dir-list-spec.txt), section 2.2.1.:
version_number
We define the following nonterminals:
value ::= ArgumentChar+ key_value ::= Keyword "=" value line ::= ArgumentChar* NL timestamp ::= Int bandwidth ::= Int relay_line ::= key_value (SP key_value)* NL
2.2. Header format
Some header lines MUST appear in specific positions, as documented below. All other lines can appear in any order.
There MUST NOT be multiple key_value header lines with the same key.
It consists of:
timestamp NL
[At start, exactly once.]
The Unix Epoch time in seconds when the file was created.
"version=" version_number NL
[In second position, zero or one time.]
The specification document format version. It uses semantic versioning [5].
This line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this line, and the version_number is considered to be "1.0.0".
"software=" value NL
[Zero or one time.]
The name of the software that created the document.
This line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this line, and the software is considered to be "torflow".
"software_version=" value NL
[Zero or one time.]
The version of the software that created the document. The version may be a version_number, a git commit, or some other version scheme.
This line has been added in version 1.1.0 of this specification.
"scanner_started=" timestamp NL
[Zero or one time.]
The Unix Epoch time in seconds when the scanner that generates the measurements document started.
This line has been added in version 1.1.0 of this specification.
"earliest_measurement=" timestamp NL
[Zero or one time.]
The Unix Epoch time in seconds when the first relay measurement was obtained.
This line has been added in version 1.1.0 of this specification.
key_value NL
[Zero or more times.]
Future format versions may include additional key_value header lines. Additional header lines will be accompanied by a minor version increment.
Implementations MAY add additional header lines as needed. This specification SHOULD be updated to avoid conflicting meanings for the same header keys.
Parsers MUST NOT rely on the order of these additional lines.
Additional header lines MUST NOT use any keywords specified in the relay measurements format.
If a header line does not conform to this format, the line SHOULD be ignored by parsers.
NL
[Zero or one time.]
The header ends.
This line has been added in version 1.1.0 of this specification.
For version 1.0.0 documents, the header ends when the first relay measurement line is found conforming to the next section.
2.3. Relay measurements format
It consists of zero or more relay_line with the measurement results of relays in arbitrary order.
There can be at most one relay_line per relay identity (fingerprint).
There MUST NOT be multiple key_value pairs with the same key in the same relay_line.
Each relay_line MUST include the following key_value in arbitrary order:
"node_id=" fingerprint
[Exactly once.]
The fingerprint of the relay being measured.
"bw=" bandwidth
[Exactly once.]
The measured bandwidth of this relay.
Tor accepts zero bandwidths, but they trigger bugs in older Tor implementations. Therefore, implementations SHOULD NOT produce zero bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
Multiple measurements can be aggregated using an averaging scheme, such as a mean, median, or decaying average.
Torflow scales bandwidths to kilobytes per second. Other implementations SHOULD use kilobytes per second for their initial bandwidth scaling.
If different implementations or configurations are used in votes for the same network, their measurements MAY need further scaling. See Appendix B for information about scaling, and one possible scaling method.
key_value
[Zero or more times.]
Future format versions may include additional key_value pairs on a relay_line. Additional key_value pairs will be accompanied by a minor version increment.
Implementations MAY add additional relay key_value pairs as needed. This specification SHOULD be updated to avoid conflicting meanings for the same relay keys.
Parsers MUST NOT rely on the order of these additional key_value pairs.
Additional key_value pairs MUST NOT use any keywords specified in the header format.
If a relay line does not conform to this format, the line SHOULD be ignored by parsers.
2.4. Implementation notes
2.4.1. Simple Bandwidth Scanner
Every relay measurement in sbws version 0.1.0 consists of:
"node_id=" fingerprint SP
As above.
"bw=" bandwidth SP
As above.
"nick=" nickname SP
[Exactly once.]
The relay nickname.
"rtt=" Int SP
[Exactly once.]
The Round Trip Time in milliseconds to obtain 1 byte of data.
"time=" timestamp NL
[Exactly once.]
The Unix Epoch time in seconds when the last measurement was performed.
2.4.2. Torflow
Torflow relay lines include node_id and bw, and other key_value pairs [2].
References:
1. https://gitweb.torproject.org/torflow.git 2. https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/R... 3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt 4. https://metrics.torproject.org/onionoo.html#details 5. https://semver.org/
A. Sample data
The following has not been obtained from any real measurement.
A.1. Generated by Torflow
This an example version 1.0.0 document:
1523911758 node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath
A.2. Generated by sbws version 0.1.0
1523911758 version=1.1.0 software=sbws software_version=0.1.0 scanner_started=1523911756 earliest_measurement=1523911757
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test rtt=380 time=1523911725 node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 rtt=378 time=1523911623
B. Scaling bandwidths
B.1. Scaling requirements
Tor accepts zero bandwidths, but they trigger bugs in older Tor implementations. Therefore, scaling methods SHOULD perform the following checks: * If the total bandwidth is zero, all relays should be given equal bandwidths. * If the scaled bandwidth is zero, it should be rounded up to one.
Initial experiments indicate that scaling may not be needed for torflow and sbws, because their measured bandwidths are similar enough already.
B.2. A linear scaling method
If scaling is required, here is a simple linear bandwith scaling method, which ensures that all bandwidth votes contain approximately the same total bandwidth:
1. Calculate the relay quota by dividing the total measured bandwidth in all votes, by the number of relays with measured bandwidth votes. In the public tor network, this is approximately 7500 as of April 2018. The quota should be a consensus parameter, so it can be adjusted for all scanners on the network.
2. Calculate a vote quota by multiplying the relay quota by the number of relays this bandwidth authority has measured bandwidths for.
3. Calculate a scaling factor by dividing the vote quota by the total unscaled measured bandwidth in this bandwidth authority's upcoming vote.
4. Multiply each unscaled measured bandwidth by the scaling factor.
Now, the total scaled bandwidth in the upcoming vote is approximately equal to the quota.
B.3. Quota changes
If all scanners are using scaling, the quota can be gradually reduced or increased as needed. Smaller quotas decrease the size of uncompressed consensuses, and may decrease the size of consensus diffs and compressed consensuses. But if the relay quota is too small, some relays may be over- or under-weighted.
Hi, Juga!
This is a review of the document from https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541... , which I *think* is the same as the document you have below.
I'm reviewing this as though it were a fully new format, since I'm not sure how much we already have locked-in based on existing code, and how much is new. We might decide that backward compatibility is more important than consistency, and if so, we won't want to take all of my recommendations here.
Tor Bandwidth Measurements Document Format juga teor
- Scope and preliminaries
This document describes the format of Tor's bandwidth measurements document, version 1.0.0 and later.
Suggestion: Maybe explicitly say "1.0.0, 1.1.0, and later"?
Since Tor version 0.2.4.12-alpha the directory authorities use the bandwidth measurements document called "V3BandwidthsFile" and produced by Torflow [1] (format described in README.spec.txt [2]).
Recommendation: "Format described in Torflow's README.spec.txt".
Explanation needed: Is this a new format, or a new specification of the existing format? Let's say so here.
Question: If this is a different format, and we're calling it version 1.0.0, what should we call the old one? But later it seems that we're introducing 1.1.0, and we're calling the old one 1.0.0.
Suggestion: let's be explicit that we're only describing the format here, and *not* describing how bwauths generate their data.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
1.2. Acknowledgements
The original bandwidth measurement scanner (Torflow) and format was created by mike. Teor suggested to write this specification while contributing on pastly's new bandwidth scanner implementation.
This specification was revised after feedback from:
XXX
1.3 Outline
The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2 of "Tor directory protocol" (dir-spec.txt) [3] are obtained by bandwidth authorities, which generate a file storing information on relays' measured bandwidth capacities.
1.4. Format Versions
1.0.0 - The legacy fallback bandwidth measurements document format
1.1.0 - Adds key_value lines to the header, format version, optional ones and section separator.
Information: Let's repeat in this section which versions of Tor can consume these versions.
- Format details
Bandwidth measurements MUST contain the following sections:
- Header (exactly once)
- Relays measurements (zero or more times)
Grammar suggestion: "Relay measurements".
2.1. Definitions
The following nonterminals are defined in dir-spec.txt, sections 1.2., 2.1.1., 2.1.3.:
Int SP (space) NL (newline) Keyword ArgumentChar fingerprint (hexdigest)
Does this have to start with a "$" ? I think it does. Maybe we should be explicit about that.
nickname
Nonterminals defined in "Tor Directory List Format" (dir-list-spec.txt), section 2.2.1.:
version_number
We define the following nonterminals:
value ::= ArgumentChar+ key_value ::= Keyword "=" value line ::= ArgumentChar* NL timestamp ::= Int bandwidth ::= Int relay_line ::= key_value (SP key_value)* NL
2.2. Header format
Some header lines MUST appear in specific positions, as documented below. All other lines can appear in any order.
There MUST NOT be multiple key_value header lines with the same key.
Maybe this line belongs below in the key_value section?
It consists of:
timestamp NL
[At start, exactly once.] The Unix Epoch time in seconds when the file was created.
Question: Why no keyword and equal sign here? Is this a legacy thing?
Also, wouldn't it be more standard to have it be in YYYY-MM-DDTHH:MM:SS format?
"version=" version_number NL
[In second position, zero or one time.] The specification document format version. It uses semantic versioning [5]. This line has been added in version 1.1.0 of this specification. Version 1.0.0 documents do not contain this line, and the version_number is considered to be "1.0.0".
General concern: I question the use of = signs here in the headers. If we use "SP" instead, then we can reuse a lot of the same machinery tor currently uses to parse other documents.
"software=" value NL
[Zero or one time.] The name of the software that created the document. This line has been added in version 1.1.0 of this specification. Version 1.0.0 documents do not contain this line, and the software is considered to be "torflow".
"software_version=" value NL
[Zero or one time.] The version of the software that created the document. The version may be a version_number, a git commit, or some other version scheme. This line has been added in version 1.1.0 of this specification.
"scanner_started=" timestamp NL
[Zero or one time.] The Unix Epoch time in seconds when the scanner that generates the measurements document started. This line has been added in version 1.1.0 of this specification.
See note above about time format. YYYY-MM-DDTHH:MM:SS is how we specify times elsewhere in Tor.
"earliest_measurement=" timestamp NL
[Zero or one time.] The Unix Epoch time in seconds when the first relay measurement was obtained. This line has been added in version 1.1.0 of this specification.
See note above about time format.
key_value NL
[Zero or more times.] Future format versions may include additional key_value header lines. Additional header lines will be accompanied by a minor version
increment.
Implementations MAY add additional header lines as needed. This specification SHOULD be updated to avoid conflicting meanings for the same header keys. Parsers MUST NOT rely on the order of these additional lines. Additional header lines MUST NOT use any keywords specified in the relay measurements format. If a header line does not conform to this format, the line SHOULD be ignored by parsers.
Suggestion: say what recipients of this document should do with unrecognized data. In general, it's good for forward compatibility to say something like, "Recipients MUST ignore key_value lines if they do not recognize the keyword. Recipients MUST ignore any extra material in a line that they do not recognize."
Also see suggestion above about using SP as our separator rather than "=" for consistency with other documents Tor parses.
NL
[Zero or one time.] The header ends. This line has been added in version 1.1.0 of this specification. For version 1.0.0 documents, the header ends when the first relay measurement line is found conforming to the next section.
Suggestion: Replace this empty line with an explicit keyword, for consistency with other documents.
2.3. Relay measurements format
It consists of zero or more relay_line with the measurement results of relays in arbitrary order.
There can be at most one relay_line per relay identity (fingerprint).
There MUST NOT be multiple key_value pairs with the same key in the same relay_line.
Each relay_line MUST include the following key_value in arbitrary order:
Do existing implementations accept arbitrary order here?
"node_id=" fingerprint
[Exactly once.] The fingerprint of the relay being measured.
Suggestion: Add a field to hold the Ed25519 Identity of the relay being measured. Say that implementations SHOULD include both RSA fingerprint and Ed25519 identity, and that implementations SHOULD accept lines that contain at least one of them.
"bw=" bandwidth
[Exactly once.] The measured bandwidth of this relay. Tor accepts zero bandwidths, but they trigger bugs in older Tor implementations. Therefore, implementations SHOULD NOT produce zero bandwidths. Instead, they SHOULD use one as their minimum bandwidth. Multiple measurements can be aggregated using an averaging scheme,
such
as a mean, median, or decaying average. Torflow scales bandwidths to kilobytes per second. Other
implementations
SHOULD use kilobytes per second for their initial bandwidth scaling. If different implementations or configurations are used in votes for
the
same network, their measurements MAY need further scaling. See
Appendix B
for information about scaling, and one possible scaling method.
key_value
[Zero or more times.]
Technically, this isn't a key_value, because a "value" is made of ArgumentChar, and ArgumentChar can contain spaces. So if we were parsing "foo=abc bar=def" we might be parsing either one key_value ("foo", "abc bar=def") or two ("foo", "abc"), ("bar, "def").
Future format versions may include additional key_value pairs on a
relay_line.
Additional key_value pairs will be accompanied by a minor version
increment.
Implementations MAY add additional relay key_value pairs as needed.
This
specification SHOULD be updated to avoid conflicting meanings for the same relay keys. Parsers MUST NOT rely on the order of these additional key_value
pairs.
Additional key_value pairs MUST NOT use any keywords specified in the header format.
As above, let's say that a parser should ignore key_value entries with keywords that it doesn't recognize.
If a relay line does not conform to this format, the line SHOULD be ignored by parsers.
2.4. Implementation notes
2.4.1. Simple Bandwidth Scanner
Every relay measurement in sbws version 0.1.0 consists of:
"node_id=" fingerprint SP
As above.
"bw=" bandwidth SP
As above.
"nick=" nickname SP
[Exactly once.] The relay nickname.
"rtt=" Int SP
[Exactly once.] The Round Trip Time in milliseconds to obtain 1 byte of data.
"time=" timestamp NL
[Exactly once.] The Unix Epoch time in seconds when the last measurement was
performed.
2.4.2. Torflow
Torflow relay lines include node_id and bw, and other key_value pairs [2].
References:
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/R...
- https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
- https://metrics.torproject.org/onionoo.html#details
- https://semver.org/
A. Sample data
The following has not been obtained from any real measurement.
A.1. Generated by Torflow
This an example version 1.0.0 document:
1523911758 node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath
A.2. Generated by sbws version 0.1.0
1523911758 version=1.1.0 software=sbws software_version=0.1.0 scanner_started=1523911756 earliest_measurement=1523911757
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
rtt=380 time=1523911725
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
rtt=378 time=1523911623
B. Scaling bandwidths
B.1. Scaling requirements
Tor accepts zero bandwidths, but they trigger bugs in older Tor implementations. Therefore, scaling methods SHOULD perform the following checks:
- If the total bandwidth is zero, all relays should be given equal bandwidths.
- If the scaled bandwidth is zero, it should be rounded up to one.
Initial experiments indicate that scaling may not be needed for torflow and sbws, because their measured bandwidths are similar enough already.
B.2. A linear scaling method
If scaling is required, here is a simple linear bandwith scaling method, which ensures that all bandwidth votes contain approximately the same total bandwidth:
Calculate the relay quota by dividing the total measured bandwidth in all votes, by the number of relays with measured bandwidth votes. In the public tor network, this is approximately 7500 as of April 2018. The quota should be a consensus parameter, so it can be adjusted for all scanners on the network.
Calculate a vote quota by multiplying the relay quota by the number of relays this bandwidth authority has measured bandwidths for.
Calculate a scaling factor by dividing the vote quota by the total unscaled measured bandwidth in this bandwidth authority's upcoming vote.
Multiply each unscaled measured bandwidth by the scaling factor.
Now, the total scaled bandwidth in the upcoming vote is approximately equal to the quota.
B.3. Quota changes
If all scanners are using scaling, the quota can be gradually reduced or increased as needed. Smaller quotas decrease the size of uncompressed consensuses, and may decrease the size of consensus diffs and compressed consensuses. But if the relay quota is too small, some relays may be over- or under-weighted.
Hi Juga,
On 2018-05-01 14:36, Nick Mathewson wrote:
This is a review of the document from https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541... , which I *think* is the same as the document you have below.
I'd like to review this document format, too, in particular with regard to archiving these documents with CollecTor in the future. (Unless there are no plans to archive them, ever.)
Should I wait for you to revise the document and join in the next review round, or should I review the document now? In the latter case, where would I find the most recent version?
Thanks!
All the best, Karsten
Karsten Loesing:
Hi Juga,
On 2018-05-01 14:36, Nick Mathewson wrote:
This is a review of the document from https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541... , which I *think* is the same as the document you have below.
I'd like to review this document format, too, in particular with regard to archiving these documents with CollecTor in the future. (Unless there are no plans to archive them, ever.)
Should I wait for you to revise the document and join in the next review round, or should I review the document now?
From my side, you can review this now.
In the latter case, where
would I find the most recent version?
I don't if i interpret you correctly, but while working on it and not in the torspec canonical repo, last version should be in https://github.com/juga0/torspec/tree/bandwidth-file-spec.
Thanks!, juga.
Hi,
Thanks Nick for the comments, i'm replaying only to the parts where i give an answer or i've more questions. I'd accept the rest of your suggestions unless there will be further comments.
Nick Mathewson:
Hi, Juga!
This is a review of the document from https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541... , which I *think* is the same as the document you have below.
Yes, it is.
I'm reviewing this as though it were a fully new format, since I'm not sure how much we already have locked-in based on existing code, and how much is new. We might decide that backward compatibility is more important than consistency, and if so, we won't want to take all of my recommendations here.
Tor Bandwidth Measurements Document Format juga teor
- Scope and preliminaries
This document describes the format of Tor's bandwidth measurements document, version 1.0.0 and later.
Suggestion: Maybe explicitly say "1.0.0, 1.1.0, and later"?
Since Tor version 0.2.4.12-alpha the directory authorities use the bandwidth measurements document called "V3BandwidthsFile" and produced by Torflow [1] (format described in README.spec.txt [2]).
Recommendation: "Format described in Torflow's README.spec.txt".
Explanation needed: Is this a new format, or a new specification of the existing format? Let's say so here.
New version of existing format. Though old version (Torflow's), didn't have an specification in the sense this specification is being made).
Question: If this is a different format, and we're calling it version 1.0.0, what should we call the old one? But later it seems that we're introducing 1.1.0, and we're calling the old one 1.0.0.
yeah, this would be 1.1.0, the old one (Torflow's) would be 1.0.0
Suggestion: let's be explicit that we're only describing the format here, and *not* describing how bwauths generate their data.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
1.2. Acknowledgements
The original bandwidth measurement scanner (Torflow) and format was created by mike. Teor suggested to write this specification while contributing on pastly's new bandwidth scanner implementation.
This specification was revised after feedback from:
XXX
1.3 Outline
The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2 of "Tor directory protocol" (dir-spec.txt) [3] are obtained by bandwidth authorities, which generate a file storing information on relays' measured bandwidth capacities.
1.4. Format Versions
1.0.0 - The legacy fallback bandwidth measurements document format
1.1.0 - Adds key_value lines to the header, format version, optional ones and section separator.
Information: Let's repeat in this section which versions of Tor can consume these versions.
- Format details
Bandwidth measurements MUST contain the following sections:
- Header (exactly once)
- Relays measurements (zero or more times)
Grammar suggestion: "Relay measurements".
2.1. Definitions
The following nonterminals are defined in dir-spec.txt, sections 1.2., 2.1.1., 2.1.3.:
Int SP (space) NL (newline) Keyword ArgumentChar fingerprint (hexdigest)
Does this have to start with a "$" ? I think it does. Maybe we should be explicit about that.
Yes
nickname
Nonterminals defined in "Tor Directory List Format" (dir-list-spec.txt), section 2.2.1.:
version_number
We define the following nonterminals:
value ::= ArgumentChar+ key_value ::= Keyword "=" value line ::= ArgumentChar* NL timestamp ::= Int bandwidth ::= Int relay_line ::= key_value (SP key_value)* NL
2.2. Header format
One more thing that teor pointed at me: any line MUST be shorter than 512 characters (legacy restriction). Teor pointed at me, i thought it was only for timestamp, but then i realized it's for any line.
Some header lines MUST appear in specific positions, as documented below. All other lines can appear in any order.
There MUST NOT be multiple key_value header lines with the same key.
Maybe this line belongs below in the key_value section?
It consists of:
timestamp NL
[At start, exactly once.] The Unix Epoch time in seconds when the file was created.
Question: Why no keyword and equal sign here? Is this a legacy thing?
Yes, because of the way Tor [0] parses it, and the way Torflow generates it.
Also, wouldn't it be more standard to have it be in YYYY-MM-DDTHH:MM:SS format?
In this case we would need to patch current versions to accept it. Would be that ok?. In that case we could also make it key_value. We need one path right now: change function in [0] to accept additional headers (ticket #25960).
"version=" version_number NL
[In second position, zero or one time.] The specification document format version. It uses semantic versioning [5]. This line has been added in version 1.1.0 of this specification. Version 1.0.0 documents do not contain this line, and the version_number is considered to be "1.0.0".
General concern: I question the use of = signs here in the headers. If we use "SP" instead, then we can reuse a lot of the same machinery tor currently uses to parse other documents.
I guess we should see then how much we should refactor function in [0] to reuse parsecommon.c (as you pointed me at by IRC).
"software=" value NL
[Zero or one time.] The name of the software that created the document. This line has been added in version 1.1.0 of this specification. Version 1.0.0 documents do not contain this line, and the software is considered to be "torflow".
"software_version=" value NL
[Zero or one time.] The version of the software that created the document. The version may be a version_number, a git commit, or some other version scheme. This line has been added in version 1.1.0 of this specification.
"scanner_started=" timestamp NL
[Zero or one time.] The Unix Epoch time in seconds when the scanner that generates the measurements document started. This line has been added in version 1.1.0 of this specification.
See note above about time format. YYYY-MM-DDTHH:MM:SS is how we specify times elsewhere in Tor.
Since this is new, then no problem on changing to this format.
"earliest_measurement=" timestamp NL
[Zero or one time.] The Unix Epoch time in seconds when the first relay measurement was obtained. This line has been added in version 1.1.0 of this specification.
See note above about time format.
key_value NL
[Zero or more times.] Future format versions may include additional key_value header lines. Additional header lines will be accompanied by a minor version
increment.
Implementations MAY add additional header lines as needed. This specification SHOULD be updated to avoid conflicting meanings for the same header keys. Parsers MUST NOT rely on the order of these additional lines. Additional header lines MUST NOT use any keywords specified in the relay measurements format. If a header line does not conform to this format, the line SHOULD be ignored by parsers.
Suggestion: say what recipients of this document should do with unrecognized data. In general, it's good for forward compatibility to say something like, "Recipients MUST ignore key_value lines if they do not recognize the keyword. Recipients MUST ignore any extra material in a line that they do not recognize."
Also see suggestion above about using SP as our separator rather than "=" for consistency with other documents Tor parses.
NL
[Zero or one time.] The header ends. This line has been added in version 1.1.0 of this specification. For version 1.0.0 documents, the header ends when the first relay measurement line is found conforming to the next section.
Suggestion: Replace this empty line with an explicit keyword, for consistency with other documents.
Also to avoid interpreting section ends when there was just garbage. Any suggestion on which one to use?, dir-list-spec.txt uses "=====", don't know which ones other documents use.
2.3. Relay measurements format
As in 2.2, to be compatible with current implementations, it MUST be shorter than 512 characters.
It consists of zero or more relay_line with the measurement results of relays in arbitrary order.
There can be at most one relay_line per relay identity (fingerprint).
There MUST NOT be multiple key_value pairs with the same key in the same relay_line.
Each relay_line MUST include the following key_value in arbitrary order:
Do existing implementations accept arbitrary order here?
Good question, it seems like bw must be behind node_id, but they can have things in front and behind. I probably should create a ticket to add more test lines in [1] or include them in #25960.
"node_id=" fingerprint
[Exactly once.] The fingerprint of the relay being measured.
Suggestion: Add a field to hold the Ed25519 Identity of the relay being measured. Say that implementations SHOULD include both RSA fingerprint and Ed25519 identity, and that implementations SHOULD accept lines that contain at least one of them.
"bw=" bandwidth
[Exactly once.] The measured bandwidth of this relay. Tor accepts zero bandwidths, but they trigger bugs in older Tor implementations. Therefore, implementations SHOULD NOT produce zero bandwidths. Instead, they SHOULD use one as their minimum bandwidth. Multiple measurements can be aggregated using an averaging scheme,
such
as a mean, median, or decaying average. Torflow scales bandwidths to kilobytes per second. Other
implementations
SHOULD use kilobytes per second for their initial bandwidth scaling. If different implementations or configurations are used in votes for
the
same network, their measurements MAY need further scaling. See
Appendix B
for information about scaling, and one possible scaling method.
key_value
[Zero or more times.]
Technically, this isn't a key_value, because a "value" is made of ArgumentChar, and ArgumentChar can contain spaces. So if we were parsing "foo=abc bar=def" we might be parsing either one key_value ("foo", "abc bar=def") or two ("foo", "abc"), ("bar, "def").
You're right. The closest from dir-spec.txt is KeywordChar, but that doesn't include colon, for instance. So, we would need to define what is accepted here (unless it is defined in some other document).
Future format versions may include additional key_value pairs on a
relay_line.
Additional key_value pairs will be accompanied by a minor version
increment.
Implementations MAY add additional relay key_value pairs as needed.
This
specification SHOULD be updated to avoid conflicting meanings for the same relay keys. Parsers MUST NOT rely on the order of these additional key_value
pairs.
Additional key_value pairs MUST NOT use any keywords specified in the header format.
As above, let's say that a parser should ignore key_value entries with keywords that it doesn't recognize.
If a relay line does not conform to this format, the line SHOULD be ignored by parsers.
2.4. Implementation notes
2.4.1. Simple Bandwidth Scanner
Every relay measurement in sbws version 0.1.0 consists of:
"node_id=" fingerprint SP
As above.
"bw=" bandwidth SP
As above.
"nick=" nickname SP
[Exactly once.] The relay nickname.
"rtt=" Int SP
[Exactly once.] The Round Trip Time in milliseconds to obtain 1 byte of data.
"time=" timestamp NL
[Exactly once.] The Unix Epoch time in seconds when the last measurement was
performed.
2.4.2. Torflow
Torflow relay lines include node_id and bw, and other key_value pairs [2].
References:
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/R...
- https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
- https://metrics.torproject.org/onionoo.html#details
- https://semver.org/
A. Sample data
The following has not been obtained from any real measurement.
A.1. Generated by Torflow
This an example version 1.0.0 document:
1523911758 node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath
A.2. Generated by sbws version 0.1.0
1523911758 version=1.1.0 software=sbws software_version=0.1.0 scanner_started=1523911756 earliest_measurement=1523911757
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
rtt=380 time=1523911725
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
rtt=378 time=1523911623
B. Scaling bandwidths
B.1. Scaling requirements
Tor accepts zero bandwidths, but they trigger bugs in older Tor implementations. Therefore, scaling methods SHOULD perform the following checks:
- If the total bandwidth is zero, all relays should be given equal bandwidths.
- If the scaled bandwidth is zero, it should be rounded up to one.
Initial experiments indicate that scaling may not be needed for torflow and sbws, because their measured bandwidths are similar enough already.
B.2. A linear scaling method
If scaling is required, here is a simple linear bandwith scaling method, which ensures that all bandwidth votes contain approximately the same total bandwidth:
Calculate the relay quota by dividing the total measured bandwidth in all votes, by the number of relays with measured bandwidth votes. In the public tor network, this is approximately 7500 as of April 2018. The quota should be a consensus parameter, so it can be adjusted for all scanners on the network.
Calculate a vote quota by multiplying the relay quota by the number of relays this bandwidth authority has measured bandwidths for.
Calculate a scaling factor by dividing the vote quota by the total unscaled measured bandwidth in this bandwidth authority's upcoming vote.
Multiply each unscaled measured bandwidth by the scaling factor.
Now, the total scaled bandwidth in the upcoming vote is approximately equal to the quota.
B.3. Quota changes
If all scanners are using scaling, the quota can be gradually reduced or increased as needed. Smaller quotas decrease the size of uncompressed consensuses, and may decrease the size of consensus diffs and compressed consensuses. But if the relay quota is too small, some relays may be over- or under-weighted.
[0] https://gitweb.torproject.org/tor.git/tree/src/or/dirserv.c#n2563 [1] https://gitweb.torproject.org/torspec.git/tree/dir-list-spec.txt#n131 [2] https://gitweb.torproject.org/tor.git/tree/src/test/test_dir.c#n1495
juga:
Each relay_line MUST include the following key_value in arbitrary order:
Do existing implementations accept arbitrary order here?
Good question, it seems like bw must be behind node_id, but they can have things in front and behind. I probably should create a ticket to add more test lines in [1] or include them in #25960.
Checked: in the current implementation, the only order required is that bw must appear before node_id. It probably does not make sense, but to be compatible with it, it is what this spec should say.
[1] https://gitweb.torproject.org/torspec.git/tree/dir-list-spec.txt#n131
Hi,
Tor Bandwidth Measurements Document Format
"Measurement" could mean a method for performing a measurement, a single measurement task, a schedule for a repeating measurement task, a measurement result or a few other things.
When Large MeAsurement Platforms (LMAP) wrote documents in the IETF, they only ever used measurement as an adjective to avoid any ambiguity.
https://www.ietf.org/archive/id/draft-eardley-lmap-terminology-02.txt
The architecture for LMAP may not fit well with the bandwidth scanner architecture, and so I'm not suggesting we adopt the terminology in that document throughout.
- Format details
Bandwidth measurements MUST contain the following > sections:
- Header (exactly once)
- Relays measurements (zero or more times)
Grammar suggestion: "Relay measurements".
In this case, this would become "Relay measurement result".
If desirable, I'd be happy to check through the document for any other places ambiguities pop up, but I'll let others finish having their comments integrated first.
Thanks, Iain.
Hi Iain,
Iain Learmonth:
Hi,
Tor Bandwidth Measurements Document Format
"Measurement" could mean a method for performing a measurement, a single measurement task, a schedule for a repeating measurement task, a measurement result or a few other things.
I also wondered whether that was the correct word and considered "capacity", but didn't convince me. Teor also suggested me to remove "Document", but i thought i'd keep it, trying to mean that the spec is only about the "file" and not the process or how they are formatted somewhere else.
Do you have a suggestion on what other word to use instead of measurements?.
When Large MeAsurement Platforms (LMAP) wrote documents in the IETF, they only ever used measurement as an adjective to avoid any ambiguity.
https://www.ietf.org/archive/id/draft-eardley-lmap-terminology-02.txt
The architecture for LMAP may not fit well with the bandwidth scanner architecture, and so I'm not suggesting we adopt the terminology in that document throughout.
- Format details
Bandwidth measurements MUST contain the following > sections:
- Header (exactly once)
- Relays measurements (zero or more times)
Grammar suggestion: "Relay measurements".
In this case, this would become "Relay measurement result".
More accurate, though starts becoming a bit too long. The title should probably become then: "Tor Bandwidth Measurements Results Document Format" Any shorter suggestion?.
If desirable, I'd be happy to check through the document for any other places ambiguities pop up, but I'll let others finish having their comments integrated first.
It's fine to continue to make comments on the thread where others commented, no need to wait until those comments are integrated. But either way works.
Thanks for your comments!, juga.
On 2 May 2018, at 18:34, juga juga@riseup.net wrote:
- Format details
Bandwidth measurements MUST contain the following > sections:
- Header (exactly once)
- Relays measurements (zero or more times)
Grammar suggestion: "Relay measurements".
In this case, this would become "Relay measurement result".
More accurate, though starts becoming a bit too long. The title should probably become then: "Tor Bandwidth Measurements Results Document Format" Any shorter suggestion?.
"Measurements Results" describes how the bandwidths are created by some generators. But a generator that believes self-reported results doesn't measure, it just aggregates. (As does a peerflow-style generator.)
"Document" is vague. Let's describe what the document is: a list.
Let's use: Tor Bandwidth List Format
What is the document? A Tor Bandwidth List
How do I parse it? Using the Tor Bandwidth List Format
Are there any similar formats? The Tor Directory List Format https://gitweb.torproject.org/torspec.git/tree/dir-list-spec.txt
T
Hi,
On 02/05/18 09:59, teor wrote:
Let's use: Tor Bandwidth List Format
As we are already using this for the directory lists, I think this makes sense as a name for the format.
"Measurements Results" describes how the bandwidths are created by
some generators. But a generator that believes self-reported results doesn't measure, it just aggregates. (As does a peerflow-style generator.)
I'm not sure I understand this. Are you saying that the format will be used to aggregate results that are collected? In this case, I think the results can still be called results in that they correspond to an active measurement of a relay and have a value.
Thanks, Iain.
On 2 May 2018, at 19:18, Iain Learmonth irl@torproject.org wrote:
"Measurements Results" describes how the bandwidths are created by some generators. But a generator that believes self-reported results doesn't measure, it just aggregates. (As does a peerflow-style generator.)
I'm not sure I understand this. Are you saying that the format will be used to aggregate results that are collected? In this case, I think the results can still be called results in that they correspond to an active measurement of a relay and have a value.
No, I'm saying that the spec is about the format. It's not about how the numbers in a file in the format are created.
"Measurement" is one way we can create the file.
Other ways to create the file are: * "copy" self-reported bandwidths from relay descriptors into the required format (the naive, pre-bandwidth scanner method) * "aggregate" bandwidths passively observed by other relays into the required format (the peerflow method) * assign all relays equal bandwidths (the fallback method in Appendix B)
So let's try to keep "relay measurement" and "relay bandwidths" as separate concepts.
T
Hi,
On 02/05/18 10:31, teor wrote:
So let's try to keep "relay measurement" and "relay bandwidths" as separate concepts.
Aaah, ok. Yes, I much prefer "Relay Bandwidth" as the name for the section in §2. There are then also lots of references to measurement in §2.2, that should also be changed to talk about bandwidths instead, e.g. "earliest_bandwidth".
Thanks, Iain.
Hi Nick,
Juga asked me to comment on your review, so she could read it before our bandwidth meeting this week. If I don't comment on a suggestion, you should assume I agree with it.
Backwards Compatibility
Nick asked about backwards compatibility. This format uses semantic versioning. Tor 0.2.9 - 0.3.3 reads format version 1.0.0. It also reads format 1.1.0, but ignores the new features with warnings.
If we want to introduce an incompatible format, we should call it 2.0.0, because semantic versioning requires a major increment for breaking changes.
Here's how we could add the new format: * The new format should have a new torrc option. * Tor should be modified to support the new format, and we should put time on the roadmap for people to work on implementing, testing, or reviewing it. * Either we should backport the new format to the latest stable release, or sbws should produce both formats.
The current implementation has at least one security bug, some weird order restrictions, and some line length restrictions. So I would support re-implementing it using the standard directory document parsing code. Even if that takes more time.
Testing the format
Most of us don't have a spare directory authority for testing.
If you run chutney with my bwfile branch, all the authorities in the network read /tmp/bwfile for every consensus. Look for the warnings at the end of the chutney output.
The basic-min network is fast: chutney/tools/test-network.sh --flavour basic-min
Here's the branch: https://github.com/teor2345/chutney/commit/ebdb4760fbcae40979ab248e4208c27a7...
I've already found one minor security bug using this branch: #26007.
Next Steps
I'm going to be away next week for a week and a half. I encourage other people to make decisions while I'm away, so we can keep making progress.
On 1 May 2018, at 22:36, Nick Mathewson nickm@alum.mit.edu wrote:
Hi, Juga!
This is a review of the document from https://raw.githubusercontent.com/juga0/torspec/c7f06023dd1d5d47adad128de541... , which I *think* is the same as the document you have below.
I'm reviewing this as though it were a fully new format, since I'm not sure how much we already have locked-in based on existing code, and how much is new. We might decide that backward compatibility is more important than consistency, and if so, we won't want to take all of my recommendations here.
Tor Bandwidth Measurements Document Format juga teor
- Scope and preliminaries
This document describes the format of Tor's bandwidth measurements
Replace measurements document with list?
document, version 1.0.0 and later.
Suggestion: Maybe explicitly say "1.0.0, 1.1.0, and later"?
Since Tor version 0.2.4.12-alpha the directory authorities use the bandwidth measurements document called
Replace measurements document with list?
"V3BandwidthsFile" and produced by Torflow [1] (format described in README.spec.txt [2]).
Recommendation: "Format described in Torflow's README.spec.txt".
Explanation needed: Is this a new format, or a new specification of the existing format? Let's say so here.
A new specification for the existing format 1.0.0. A new format 1.1.0, which is backwards compatible with 1.0.0 parsers.
Question: If this is a different format, and we're calling it version 1.0.0, what should we call the old one? But later it seems that we're introducing 1.1.0, and we're calling the old one 1.0.0.
"The Legacy Torflow format" or just "legacy"?
Suggestion: let's be explicit that we're only describing the format here, and *not* describing how bwauths generate their data.
I agree. We want to leave room for peerflow and future schemes. So we might want to: * replace every "measurements document" with "list" * replace every "measurements scanner" with "generator"
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
1.2. Acknowledgements
The original bandwidth measurement scanner (Torflow)
Replace measurement scanner with generator?
and format was
created by mike. Teor suggested to write this specification while contributing on pastly's new bandwidth scanner implementation.
This specification was revised after feedback from:
XXX
Please update.
1.3 Outline
The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
Hmm, the dir-spec calls them measurements. Maybe we should fix it as well.
of "Tor directory protocol" (dir-spec.txt) [3] are obtained by bandwidth authorities,
Is a bandwidth authority a directory authority that votes for bandwidths? Or is it a bandwidth generator that produces the bandwidth file?
which generate a file storing information
on relays' measured bandwidth capacities.
Remove "measured".
1.4. Format Versions
1.0.0 - The legacy fallback bandwidth measurements document format
Instead of "bandwidth measurements document format", say "bandwidth list"?
1.1.0 - Adds key_value lines to the header, format version, optional ones and section separator.
Information: Let's repeat in this section which versions of Tor can consume these versions.
All Tor versions can consume format version 1.0.0. All Tor versions can consume format version 1.1.0, but they warn on header lines. See https://trac.torproject.org/projects/tor/ticket/25960
- Format details
Bandwidth measurements MUST contain the following sections:
And if they don't, the file SHOULD be ignored.
- Header (exactly once)
- Relays measurements (zero or more times)
Grammar suggestion: "Relay measurements".
Replace "measurements" with "bandwidths"?
2.1. Definitions
The following nonterminals are defined in dir-spec.txt, sections 1.2., 2.1.1., 2.1.3.:
Int SP (space) NL (newline) Keyword ArgumentChar fingerprint (hexdigest)
Does this have to start with a "$" ? I think it does. Maybe we should be explicit about that.
It does. And we should.
nickname
Nonterminals defined in "Tor Directory List Format" (dir-list-spec.txt), section 2.2.1.:
version_number
We define the following nonterminals:
value ::= ArgumentChar+
Excluding SP
key_value ::= Keyword "=" value line ::= ArgumentChar* NL timestamp ::= Int bandwidth ::= Int relay_line ::= key_value (SP key_value)* NL
2.2. Header format
Some header lines MUST appear in specific positions, as documented below.
And if they don't, the file SHOULD be ignored.
All other lines can appear in any order.
There MUST NOT be multiple key_value header lines with the same key.
And if there are, the parser SHOULD choose an arbitrary line.
All lines in the file MUST be 510 characters or less, to allow for the trailing newline and NUL characters. (The previous limit was 254 characters in Tor 0.2.6.2-alpha and earlier.)
The parser MAY ignore longer lines.
Should we lift this restriction in 1.1.0?
Maybe this line belongs below in the key_value section?
It consists of:
timestamp NL
[At start, exactly once.] The Unix Epoch time in seconds when the file was created.
Question: Why no keyword and equal sign here? Is this a legacy thing?
Yes, tor expects a Unix timestamp on a single line by itself.
Also, wouldn't it be more standard to have it be in YYYY-MM-DDTHH:MM:SS format?
Tor refuses to read bandwidth files unless they start with an integer on a line by itself. So this would be a breaking change.
"version=" version_number NL
[In second position, zero or one time.] The specification document format version. It uses semantic versioning [5]. This line has been added in version 1.1.0 of this specification. Version 1.0.0 documents do not contain this line, and the version_number is considered to be "1.0.0".
General concern: I question the use of = signs here in the headers. If we use "SP" instead, then we can reuse a lot of the same machinery tor currently uses to parse other documents.
I think using SP is fine.
But if we want to re-use the parsing machinery, we probably need to add a keyword to the initial timestamp. That would be a breaking change.
"software=" value NL
[Zero or one time.] The name of the software that created the document. This line has been added in version 1.1.0 of this specification. Version 1.0.0 documents do not contain this line, and the software is considered to be "torflow".
"software_version=" value NL
[Zero or one time.] The version of the software that created the document. The version may be a version_number, a git commit, or some other version scheme. This line has been added in version 1.1.0 of this specification.
If we use SP as a separator, we can make these two lines:
"software" SP name_value SP version_value NL
"scanner_started=" timestamp NL
[Zero or one time.] The Unix Epoch time in seconds when the scanner that generates the measurements document started. This line has been added in version 1.1.0 of this specification.
See note above about time format. YYYY-MM-DDTHH:MM:SS is how we specify times elsewhere in Tor.
This is a new field, so we can choose the format.
"earliest_measurement=" timestamp NL
[Zero or one time.] The Unix Epoch time in seconds when the first relay measurement was obtained. This line has been added in version 1.1.0 of this specification.
See note above about time format.
key_value NL
[Zero or more times.] Future format versions may include additional key_value header lines. Additional header lines will be accompanied by a minor version increment. Implementations MAY add additional header lines as needed. This specification SHOULD be updated to avoid conflicting meanings for the same header keys. Parsers MUST NOT rely on the order of these additional lines. Additional header lines MUST NOT use any keywords specified in the relay measurements format.
And if there are, the parser MAY ignore conflicting keywords.
If a header line does not conform to this format, the line SHOULD be ignored by parsers.
Suggestion: say what recipients of this document should do with unrecognized data. In general, it's good for forward compatibility to say something like, "Recipients MUST ignore key_value lines if they do not recognize the keyword. Recipients MUST ignore any extra material in a line that they do not recognize."
We should specify what parsers should do with every MUST in the document.
Also see suggestion above about using SP as our separator rather than "=" for consistency with other documents Tor parses.
NL
[Zero or one time.] The header ends. This line has been added in version 1.1.0 of this specification. For version 1.0.0 documents, the header ends when the first relay measurement line is found conforming to the next section.
Suggestion: Replace this empty line with an explicit keyword, for consistency with other documents.
2.3. Relay measurements format
It consists of zero or more relay_line with the measurement results of relays in arbitrary order.
There can be at most one relay_line per relay identity (fingerprint).
There MUST NOT be multiple key_value pairs with the same key in the same relay_line.
And if there are, the parser SHOULD choose an arbitrary value.
Each relay_line MUST include the following key_value in arbitrary order:
Do existing implementations accept arbitrary order here?
Existing Tor implementations do not accept node_id at the end of a line. https://trac.torproject.org/projects/tor/ticket/26004
We should: * add this as a MUST NOT in 1.0.0, and * allow it in 1.1.0, with a list of tor versions that support it
If we use the standard directory parser, each relay line will have to start with a keyword. Perhaps we should use "b" or "r" or "n". This would be a breaking change.
"node_id=" fingerprint
[Exactly once.] The fingerprint of the relay being measured.
Suggestion: Add a field to hold the Ed25519 Identity of the relay being measured. Say that implementations SHOULD include both RSA fingerprint and Ed25519 identity, and that implementations SHOULD accept lines that contain at least one of them.
Suggestion: the ed25519 IDs should be base64 encoded, without a trailing =, because a trailing = makes the format ambiguous.
"bw=" bandwidth
[Exactly once.] The measured bandwidth of this relay. Tor accepts zero bandwidths, but they trigger bugs in older Tor implementations. Therefore, implementations SHOULD NOT produce zero bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
And if there are zero bandwidths, the parser MAY ignore them.
Multiple measurements can be aggregated using an averaging scheme, such as a mean, median, or decaying average. Torflow scales bandwidths to kilobytes per second. Other implementations SHOULD use kilobytes per second for their initial bandwidth scaling. If different implementations or configurations are used in votes for the same network, their measurements MAY need further scaling. See Appendix B for information about scaling, and one possible scaling method.
key_value
[Zero or more times.]
Technically, this isn't a key_value, because a "value" is made of ArgumentChar, and ArgumentChar can contain spaces. So if we were parsing "foo=abc bar=def" we might be parsing either one key_value ("foo", "abc bar=def") or two ("foo", "abc"), ("bar, "def").
Let's exclude SP from value to resolve this issue.
Future format versions may include additional key_value pairs on a relay_line. Additional key_value pairs will be accompanied by a minor version increment. Implementations MAY add additional relay key_value pairs as needed. This specification SHOULD be updated to avoid conflicting meanings for the same relay keys. Parsers MUST NOT rely on the order of these additional key_value pairs. Additional key_value pairs MUST NOT use any keywords specified in the header format.
And if there are, the parser MAY ignore conflicting keywords.
As above, let's say that a parser should ignore key_value entries with keywords that it doesn't recognize.
If a relay line does not conform to this format, the line SHOULD be ignored by parsers. …
T
On 2 May 2018, at 22:39, teor teor2345@gmail.com wrote:
Tor accepts zero bandwidths, but they trigger bugs in older Tor implementations. Therefore, implementations SHOULD NOT produce zero bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
And if there are zero bandwidths, the parser MAY ignore them.
Bandwidth files also need to respect MaxAdvertisedBandwidth and RelayBandwidthRate/Burst. We need to specify that the relay descriptor bandwidth rate and burst should limit the bandwidths in the file.
Torflow supports MaxAdvertisedBandwidth by putting relays in partitions that match their bandwidth. Maybe it also does some other adjustments.
sbws can probably just do a min() using the measured bandwidth: https://github.com/pastly/simple-bw-scanner/issues/155
For details, see: https://trac.torproject.org/projects/tor/ticket/8494#comment:5
T
teor:
Hi Nick,
Suggestion: Add a field to hold the Ed25519 Identity of the relay being measured. Say that implementations SHOULD include both RSA fingerprint and Ed25519 identity, and that implementations SHOULD accept lines that contain at least one of them.
Suggestion: the ed25519 IDs should be base64 encoded, without a trailing =, because a trailing = makes the format ambiguous.
You're talking about the certificate, right?. This would change the concept of "line", since the certificate is more than one "line".
This is how it is defined in dir-list-spec.txt
base64-encoded-ed25519-identity :== "-----BEGIN ED25519 CERT-----" NL certificate "-----END ED25519 CERT-----" NL
On 7 May 2018, at 06:54, juga juga@riseup.net wrote:
teor:
Hi Nick,
Suggestion: Add a field to hold the Ed25519 Identity of the relay being measured. Say that implementations SHOULD include both RSA fingerprint and Ed25519 identity, and that implementations SHOULD accept lines that contain at least one of them.
Suggestion: the ed25519 IDs should be base64 encoded, without a trailing =, because a trailing = makes the format ambiguous.
You're talking about the certificate, right?. This would change the concept of "line", since the certificate is more than one "line".
This is how it is defined in dir-list-spec.txt
base64-encoded-ed25519-identity :== "-----BEGIN ED25519 CERT-----" NL certificate "-----END ED25519 CERT-----" NL
The certificate is a proof of identity. But we only need to refer to a relay by its ed25519 public key:
"master-key-ed25519" SP MasterKey NL
[At most once]
Contains the base-64 encoded ed25519 master key as a single argument. If it is present, it MUST match the identity key in the identity-ed25519 entry.
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n416
T