Heyo.
We're going to have a meeting to discuss Proposal 291. See this thread:
https://lists.torproject.org/pipermail/tor-dev/2018-April/013053.html
The meeting will be at 17:00 UTC, on Wednesday, April 18th, in
#tor-meeting on irc.oftc.net. (That's 10:00 left coast, 12:00 middle
coast, 13:00 right coast, and 19:00 in several socialist paradises that
strangely do not have public water fountains.)
https://www.timeanddate.com/worldclock/fixedtime.html?iso=20180415T1700
Things we need to decide:
1. Do we abandon Tor's path restrictions?
2. Do we use two guards?
At the end of this meeting, we should commit to one or both of these
things long-term. (Surprise twist: we're already doing #2!)
Each of these choices is a nuanced thing. And just picking one or the
other doesn't solve everything. I think it's best to think of them as a
commitment to a plan over some timescale, based on the information we
have available today.
People who mos def should attend:
George Kadianakis,
Roger,
Nick,
Me
People who probably maybe should attend:
Aaron Johnson,
Isis (and others concerned about guard fingerprinting),
You?
--
Mike Perry
Hello Everyone,
We briefly talked about this idea of taking over Tor Messenger, a long
with our own ideas and were asked to mail in some
Documentation/Roadmaps. I'm asking in what form would you guys like to
see this in?
Basically:
Would you like us to add a .txt as an attachment in a follow up email?
Or would it be better to host the files on our site, and allow people to
read without subject to downloading anything?
Thank you,
~ Beard | https://twitter.com/beardlyness
Hi,
after teor's revision, second version pasted below.
Changes can be seen: in
https://github.com/juga0/torspec/commits/bandwidth-file-spec
Best,
juga
=================================================================
Tor Bandwidth Measurements Document Format
juga
teor
1. Scope and preliminaries
This document describes the format of Tor's bandwidth measurements
document, version 1.0.0 and later.
Since Tor version 0.2.4.12-alpha the directory
authorities use the bandwidth measurements document called
"V3BandwidthsFile" and produced by Torflow [1]
(format described in README.spec.txt [2]).
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
1.2. Acknowledgements
The original bandwidth measurement scanner (Torflow) and format was
created by mike. Teor suggested to write this specification while
contributing on pastly's new bandwidth scanner implementation.
This specification was revised after feedback from:
XXX
1.3 Outline
The bandwidth measurements mentioned in sections 3.4.1 and 3.4.2
of "Tor directory protocol" (dir-spec.txt) [3] are obtained
by bandwidth authorities, which generate a file storing information
on relays' measured bandwidth capacities.
1.4. Format Versions
1.0.0 - The legacy fallback bandwidth measurements document format
1.1.0 - Adds key_value lines to the header, format version,
optional ones and section separator.
2. Format details
Bandwidth measurements MUST contain the following sections:
- Header (exactly once)
- Relays measurements (zero or more times)
2.1. Definitions
The following nonterminals are defined in dir-spec.txt, sections
1.2., 2.1.1., 2.1.3.:
Int
SP (space)
NL (newline)
Keyword
ArgumentChar
fingerprint (hexdigest)
nickname
Nonterminals defined in "Tor Directory List Format" (dir-list-spec.txt),
section 2.2.1.:
version_number
We define the following nonterminals:
value ::= ArgumentChar+
key_value ::= Keyword "=" value
line ::= ArgumentChar* NL
timestamp ::= Int
bandwidth ::= Int
relay_line ::= key_value (SP key_value)* NL
2.2. Header format
Some header lines MUST appear in specific positions, as documented below.
All other lines can appear in any order.
There MUST NOT be multiple key_value header lines with the same key.
It consists of:
timestamp NL
[At start, exactly once.]
The Unix Epoch time in seconds when the file was created.
"version=" version_number NL
[In second position, zero or one time.]
The specification document format version.
It uses semantic versioning [5].
This line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this line, and the
version_number is considered to be "1.0.0".
"software=" value NL
[Zero or one time.]
The name of the software that created the document.
This line has been added in version 1.1.0 of this specification.
Version 1.0.0 documents do not contain this line, and the software is
considered to be "torflow".
"software_version=" value NL
[Zero or one time.]
The version of the software that created the document.
The version may be a version_number, a git commit, or some other
version scheme.
This line has been added in version 1.1.0 of this specification.
"scanner_started=" timestamp NL
[Zero or one time.]
The Unix Epoch time in seconds when the scanner that generates the
measurements document started.
This line has been added in version 1.1.0 of this specification.
"earliest_measurement=" timestamp NL
[Zero or one time.]
The Unix Epoch time in seconds when the first relay measurement
was obtained.
This line has been added in version 1.1.0 of this specification.
key_value NL
[Zero or more times.]
Future format versions may include additional key_value header lines.
Additional header lines will be accompanied by a minor version
increment.
Implementations MAY add additional header lines as needed. This
specification SHOULD be updated to avoid conflicting meanings for the
same header keys.
Parsers MUST NOT rely on the order of these additional lines.
Additional header lines MUST NOT use any keywords specified in the
relay measurements format.
If a header line does not conform to this format, the line SHOULD be
ignored by parsers.
NL
[Zero or one time.]
The header ends.
This line has been added in version 1.1.0 of this specification.
For version 1.0.0 documents, the header ends when the first relay
measurement line is found conforming to the next section.
2.3. Relay measurements format
It consists of zero or more relay_line with the measurement results
of relays in arbitrary order.
There can be at most one relay_line per relay identity (fingerprint).
There MUST NOT be multiple key_value pairs with the same key in the same
relay_line.
Each relay_line MUST include the following key_value in arbitrary order:
"node_id=" fingerprint
[Exactly once.]
The fingerprint of the relay being measured.
"bw=" bandwidth
[Exactly once.]
The measured bandwidth of this relay.
Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, implementations SHOULD NOT produce zero
bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
Multiple measurements can be aggregated using an averaging scheme, such
as a mean, median, or decaying average.
Torflow scales bandwidths to kilobytes per second. Other implementations
SHOULD use kilobytes per second for their initial bandwidth scaling.
If different implementations or configurations are used in votes for the
same network, their measurements MAY need further scaling. See
Appendix B
for information about scaling, and one possible scaling method.
key_value
[Zero or more times.]
Future format versions may include additional key_value pairs on a
relay_line.
Additional key_value pairs will be accompanied by a minor version
increment.
Implementations MAY add additional relay key_value pairs as needed. This
specification SHOULD be updated to avoid conflicting meanings for the
same relay keys.
Parsers MUST NOT rely on the order of these additional key_value pairs.
Additional key_value pairs MUST NOT use any keywords specified in the
header format.
If a relay line does not conform to this format, the line SHOULD be
ignored by parsers.
2.4. Implementation notes
2.4.1. Simple Bandwidth Scanner
Every relay measurement in sbws version 0.1.0 consists of:
"node_id=" fingerprint SP
As above.
"bw=" bandwidth SP
As above.
"nick=" nickname SP
[Exactly once.]
The relay nickname.
"rtt=" Int SP
[Exactly once.]
The Round Trip Time in milliseconds to obtain 1 byte of data.
"time=" timestamp NL
[Exactly once.]
The Unix Epoch time in seconds when the last measurement was performed.
2.4.2. Torflow
Torflow relay lines include node_id and bw, and other key_value pairs [2].
References:
1. https://gitweb.torproject.org/torflow.git
2.
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/…
3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
4. https://metrics.torproject.org/onionoo.html#details
5. https://semver.org/
A. Sample data
The following has not been obtained from any real measurement.
A.1. Generated by Torflow
This an example version 1.0.0 document:
1523911758
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719
pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577
circ_fail=0.2 scanner=/filepath
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994
pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988
circ_fail=0.0 scanner=/filepath
A.2. Generated by sbws version 0.1.0
1523911758
version=1.1.0
software=sbws
software_version=0.1.0
scanner_started=1523911756
earliest_measurement=1523911757
node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test
rtt=380 time=1523911725
node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2
rtt=378 time=1523911623
B. Scaling bandwidths
B.1. Scaling requirements
Tor accepts zero bandwidths, but they trigger bugs in older Tor
implementations. Therefore, scaling methods SHOULD perform the
following checks:
* If the total bandwidth is zero, all relays should be given equal
bandwidths.
* If the scaled bandwidth is zero, it should be rounded up to one.
Initial experiments indicate that scaling may not be needed for
torflow and sbws, because their measured bandwidths are similar
enough already.
B.2. A linear scaling method
If scaling is required, here is a simple linear bandwith scaling
method, which ensures that all bandwidth votes contain approximately
the same total bandwidth:
1. Calculate the relay quota by dividing the total measured bandwidth
in all votes, by the number of relays with measured bandwidth
votes. In the public tor network, this is approximately 7500 as of
April 2018. The quota should be a consensus parameter, so it can be
adjusted for all scanners on the network.
2. Calculate a vote quota by multiplying the relay quota by the number
of relays this bandwidth authority has measured
bandwidths for.
3. Calculate a scaling factor by dividing the vote quota by the
total unscaled measured bandwidth in this bandwidth
authority's upcoming vote.
4. Multiply each unscaled measured bandwidth by the scaling
factor.
Now, the total scaled bandwidth in the upcoming vote is
approximately equal to the quota.
B.3. Quota changes
If all scanners are using scaling, the quota can be gradually
reduced or increased as needed. Smaller quotas decrease the size
of uncompressed consensuses, and may decrease the size of
consensus diffs and compressed consensuses. But if the relay
quota is too small, some relays may be over- or under-weighted.
Next meeting is 10 May 2018 at 1200 UTC for 30 minutes... maybe. While
we should find a standard time, next week is a special case for pastly
and teor.
Notes are https://pad.riseup.net/p/ioYq89yZSx1t and copy/pasted below.
---------------------------------------------------------------------
This pad:
https://pad.riseup.net/p/ioYq89yZSx1t
Meetbot log:
http://meetbot.debian.net/tor-meeting/2018/tor-meeting.2018-05-03-09.29.html
Last week
https://lists.torproject.org/pipermail/tor-dev/2018-April/013108.html
Next Milestone:
1st Status Update: May 25th
All Milestones:
https://trac.torproject.org/projects/tor/wiki/doc/gsoc
######################## Updates
pastly:
- Make significant strides towards the switch to HTTP/S
- sbws launches Tor for itself
- started tagging and icrementing and signing versions
teor:
- reviewed the bandwidth format spec
- feedback on tor bwfile tests and bug fixes
- feedback on sbws / tor launching
- the sbws vs torflow averages in the testnet are stable and consistent
- fixed a buffer read-past-the-end when the file can't be read in
the bwfile parsing code
juga:
Last week:
* sbws:
* Worked on "Add what to include in a source distribution"
(issues/132)
* Discussed with pastly about "Include file where to write
``generate`` results in the config?"
* Created "Fix version, prepare for future release"
(issues/131): pastly started to tag versions, what is needed for
packaging/distributing sbws (whatever the method will be)
* Worked on "Include software and sofware_version headers"
(issues/96), waiting to finish the spec so that we don't keep switch format
* Worked on "Add useful information in sbws header lines"
(issues/119): PR 130 waiting to finish spec
* spec:
* Worked with teor on the new version sent to @tor-dev
* little-tor:
* Worked on "Allow additional header lines" (#25960): also
waiting for the specs
* Started to write tests for the previous and "Create unit
test for dirserv_read_measured_bandwidths" (#25947)
Next week:
* spec: wait for comments and change according to it
* little-tor:
* continue with #25960 (when spec ready)
* finish #25947, tests for current code, additional header
lines
and possible refactor
* sbws: work on 96, 119 when spec is ready
######################## Discussions
-------------- sbws logo offer??
Is this something we care about?
Decision: not really, but ux might like the help
-------------- Meeting time
0930 UTC is terribly early for pastly, but he can do it in the name of
collab.
1130 UTC would actually be easier for teor, but that's about as late as
they can go
For juga both times are fine ^
Decision: from 12:00 to 12:30 UTC
Decision: 1200 UTC and only 30m long next week
-------------- Bandwidth spec
Bugfixes and incremental improvements, or a rewrite?
Should i create a new version, include the changes or wait for more
comments?
Decision: make minor changes if there is time
-------------- Bandwidth file parsing code in Tor
unit tests are ok
bugfixes are ok
What this mean? ^
we can always do tests and bugfixes
If we implement the format, we might have to change it
-------------- Bandwidth file generation
If we implement the format, we might have to change it
pastly thinks he/we can easily handle whatever crazy thing you all come
up with for the Tor side of code as long as it follows the general idea
of one-line-per-relay, simple integers, and maybe a header with some
metadata
juga thinks there is not crazy thing, and parsing strings in python is
way easier than c :)
Out of 9900 possible two hop tor circuits among the top 100 tor relays
only 935 circuit builds have succeeded. This is way worse than the last
time I sent a report 6 months ago during the Montreal tor dev meeting.
Here's the scanner I use:
https://github.com/david415/tor_partition_scanner
(I was planning on improving this testing methodology in collaboration with
Katharina Kohls but was unable to travel to Bochum University because of
visa limitations. It was either go to tor-dev meeting or Bochum but not both.)
Here's the gist of my simple testing methodology:
https://gist.github.com/david415/9875821652018431dd6d6c4407bb90c0#file-dete…
Here's exactly how I performed the scan to get those results:
wget https://collector.torproject.org/recent/relay-descriptors/consensuses/2018-…
./helpers/query_fingerprints_from_consensus_file.py 2018-03-1
3-01-00-00-consensus > top100.relays
detect_partitions.py --tor-control tcp:127.0.0.1:9051 --log-dir ./ --status-log ./status_log \
--relay-list top100.relays --secret secretTorEmpireOfRelays --partitions 1 --this-partition 0 \
--build-duration .25 --circuit-timeout 60 --log-chunk-size 1000 --max-concurrency 100
echo "select first_hop, second_hop from scan_log where status = 'failure';" | sqlite3 scan1.db | wc -l
8942
echo "select first_hop, second_hop from scan_log where status = 'timeout';" | sqlite3 scan1.db | wc -l
23
echo "select first_hop, second_hop from scan_log where status = 'success';" | sqlite3 scan1.db | wc -l
935
Hi All,
Looking at the recent work on the Tor bandwidth measurements document
format, I've noticed there are a few places where language can be
ambiguous [0]. This morning, I noticed more ambiguity in some metrics
tools [1]. We could do with a glossary.
The problem though is not the lack of a glossary, but that we have at
least 3 [2][3][4].
I've just been discussing this with juga in IRC. For the Metrics
glossary, I think (but please correct me if I'm wrong) that the Metrics
team would be happy to have our glossary's definitions match up with
torspec. I also think it would be cool if we only add new terms to the
Metrics glossary if they have a corresponding term in torspec's glossary.
Would the torspec maintainers be happy to review and merge patches for
new terms to facilitate that?
The community glossary is perhaps more broad than the Metrics or torspec
glossaries, and so having all those terms in torspec would probably not
be a useful thing to do.
Would it be agreeable with the community team that terms that are
already defined in torspec should not be overloaded and that new terms
shouldn't be defined in torspec if they would have conflicting meanings
with terms defined in the community glossary?
If you have other ideas, then please do also suggest them.
Thanks,
Iain.
[0] https://lists.torproject.org/pipermail/tor-dev/2018-May/013145.html
[1] https://lists.torproject.org/pipermail/tor-relays/2018-May/015132.html
[2] https://gitweb.torproject.org/torspec.git/tree/glossary.txt
[3] https://metrics.torproject.org/glossary.html
[4] https://trac.torproject.org/projects/tor/wiki/doc/community/glossary