tor-dev March 2015

tor-dev@lists.torproject.org

76 participants
78 discussions

Re: [tor-dev] Progress on hidserv-stats Metrics integration, request for code review
by Karsten Loesing 13 Mar '15

13 Mar '15

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [Cc'ing tor-dev@, because why not.] On 11/03/15 19:13, Karsten Loesing wrote: > Please let me know if I can help *reduce* confusion somehow. :) Looking forward, hidden-service statistics are now available on Metrics: https://metrics.torproject.org/hidserv-data.html I also started making some very quick graphs here: https://people.torproject.org/~karsten/volatile/hidserv-stats-2015-03-11.pdf The question is, what graphs do we want on Metrics? How about: - Total hidden-service traffic in Mbit/s (per day, using weighted interquartile mean, like lower graph on page 1 of the PDF) - Unique .onion addresses (per day, using weighted interquartile mean, like upper graph on page 1 of the PDF) - Fraction of relays reporting hidden-service statistics (containing both dir-onions-seen and rend-relayed-cells, like page 3 of the PDF) Note that I left out "fraction of traffic", because we can't guarantee that our many assumptions we made for the blog post will hold in the future. Happy to be convinced otherwise. Also note that more is not necessarily better. All graphs we put on Metrics should be easy to comprehend for non-researchers and non-developers. If there's a graph that you care about but that not many other people would care about, it's easier to write a graphing script to plot what's in hidserv.csv rather than add yet one more thing to Metrics. By the way, I decided against using onion service terminology, because I wasn't sure when we were planning to switch. I'm not sure if Metrics should be one of the first Tor websites to switch, or whether people will just wonder what crazy Tor-unrelated stuff Metrics has statistics for. I don't feel strongly though. Thoughts? Thanks! All the best, Karsten -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJVAbKrAAoJEJD5dJfVqbCrRpcH/269XxlatdhSjiqlrIVxmfjU Yz9UnnrBToYJQ1As1o7KUG7NiW+vpq/qmsdNNxjogyEUr4EOPQVd6TDb/4+xjcDM HbiZRfrEu51KSDPiOYqZwFWcOoSOMtf34PiTyu+eo+xWsZ8fd+FCrnk5Qk9rDP7S RYKtHSV9RWY8G3RmDJHqOJNwbF76vxKVHVfQ2qY9ufHe3emS6eAkFzlg8KqRFrkv i1zhyXPNWUauW6mKfUWa/nCS7fae46xzx6J3oertvbKdBKtcNmyl1PqYgrCDTIUX pc4N68xyCJN+FNji/uI6mWCcW2FE059uGYDNpOMzJGeSovU0naPTrpmROtR7Mts= =P8oo -----END PGP SIGNATURE-----

3 4

Re: [tor-dev] Progress on hidserv-stats Metrics integration, request for code review
by Karsten Loesing 13 Mar '15

13 Mar '15

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/03/15 17:08, A. Johnson wrote: >> Looking forward, hidden-service statistics are now available on >> Metrics: >> >> https://metrics.torproject.org/hidserv-data.html > > Looks great! > >> - Total hidden-service traffic in Mbit/s (per day, using >> weighted interquartile mean, like lower graph on page 1 of the >> PDF) >> >> - Unique .onion addresses (per day, using weighted interquartile >> mean, like upper graph on page 1 of the PDF) > > These seem like a good idea. Great! I started with the second graph, because it seems least disputed: https://metrics.torproject.org/hidserv-dir-onions-seen.html >> - Fraction of relays reporting hidden-service statistics >> (containing both dir-onions-seen and rend-relayed-cells, like >> page 3 of the PDF) > > This is probably less interesting to most people, but it is > important to people serious about understanding the meaning of the > data. So I could take this or leave it. Agreed. I'll leave this graph out for the moment. >> Note that I left out "fraction of traffic", because we can't >> guarantee that our many assumptions we made for the blog post >> will hold in the future. Happy to be convinced otherwise. > > The calculation of client traffic fraction assumed that most > traffic from exit relays was in fact exit traffic. The validity of > that assumption may indeed change in the future, depending > especially on how the consensus position weights change. So I agree > that it is not a great idea to include a graph of this number on > the Metrics page. I wonder if we can simplify the calculation somehow, so that we don't have to worry (as much) that it will break in the future. Hmm. > The calculation of traffic fraction at relays only relied on (i) > rendezvous circuits being six hops (not a shaky assumption) and > (ii) that the Metrics numbers for total network traffic was > accurate (also seems like a good assumption). So it seems that we > could include this number, although it is the less interesting of > the two numbers. True. Let's keep this in mind as plan B. >> By the way, I decided against using onion service terminology, >> because I wasn't sure when we were planning to switch. I'm not >> sure if Metrics should be one of the first Tor websites to >> switch, or whether people will just wonder what crazy >> Tor-unrelated stuff Metrics has statistics for. I don't feel >> strongly though. Thoughts? > > You could use the new terminology, with a footnote on the page > explaining that "onion service" is the same as "hidden service". I think I'd rather want to wait until documentation on Tor's website and in tools is updated. All the best, Karsten -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJVAwHAAAoJEJD5dJfVqbCrdJsH/iUuCNMq/R/Yki015ZZ6i7+z OfszriSwUsO4MNuAX7E3yHHlbd5ZDnPJbN+H65wSIrFz2Tu8i1OopORu4EfJLukN 9zpS+SSR0ZoQk4BP8bw0447b46V6GsCy14TLnxUvGBvA1qaYwZM7JKH+RIDkztN/ b1aHf1IxkH92LzxNex/bAkxU6+ivIrRUIC/+/hVa9F2K9FTEbMh1T1WrS9TAukPZ kRW/wqk2wVXgZYV3Vur6bP+5gXOjvXiO5gpKzBv0wVlroCLgOI8idzF1JScQc2AA vEoBr9iFF7JBgtCtnyg6GZNcZvqTIb1/cQ1e2xdYJLluX5UAveNExxC96bCl8lo= =kE9g -----END PGP SIGNATURE-----

1 0

Summary of meek's costs, February 2015
by David Fifield 13 Mar '15

13 Mar '15

Here's the summary of meek's CDN fees for February 2015. Earlier reports: https://lists.torproject.org/pipermail/tor-dev/2014-August/007429.html https://lists.torproject.org/pipermail/tor-dev/2014-October/007576.html https://lists.torproject.org/pipermail/tor-dev/2014-November/007716.html https://lists.torproject.org/pipermail/tor-dev/2014-December/007916.html https://lists.torproject.org/pipermail/tor-dev/2015-January/008082.html https://lists.torproject.org/pipermail/tor-dev/2015-February/008235.html App Engine + Amazon + Azure = total by month February 2014 $0.09 + -- + -- = $0.09 March 2014 $0.00 + -- + -- = $0.00 April 2014 $0.73 + -- + -- = $0.73 May 2014 $0.69 + -- + -- = $0.69 June 2014 $0.65 + -- + -- = $0.65 July 2014 $0.56 + $0.00 + -- = $0.56 August 2014 $1.56 + $3.10 + -- = $4.66 September 2014 $4.02 + $4.59 + $0.00 = $8.61 October 2014 $40.85 + $130.29 + $0.00 = $171.14 November 2014 $224.67 + $362.60 + $0.00 = $587.27 December 2014 $326.81 + $417.31 + $0.00 = $744.12 January 2015 $464.37 + $669.02 + $0.00 = $1133.39 February 2015 $650.53 + $604.83 + $0.00 = $1255.36 -- total by CDN $1715.53 + $2191.74 + $0.00 = $3907.27 grand total My motivation behind making these reports is to provide transparency and to contribute to the body of scientific knowledge. I think it's useful to document these kinds of numbers, for the benefit of others who want to experiment with this kind of system. The number of simultaneous users was up a little bit in February relative to December, hovering around 1250. https://metrics.torproject.org/userstats-bridge-transport.html?graph=userst… The bill for Amazon actually went down last month, for the first time ever. Okay, that is partially because February is about 10% shorter than January. Actually, if you account for the shorter month, the bill is eerily almost exactly the same: $669.02 * 28/31 = $604.28 ≈ $604.83 (actual Feb. cost) == App Engine a.k.a. meek-google == Bandwidth was up about 30% and instance hours were up about 45%. At the end of February, I tried tweaking some performance parameters as an experiment to cause fewer instances to be created. This is to reduce costs, and also because I guess that extra instances don't really help much with performance in this case. The creation of instances is triggered by the application's response latency, which for us is usually caused by some temporary network issue, rather than a lack of CPU or something like that. We'll see next month whether it has an effect. Here is how the Google costs broke down: 4114 GB $492.71 3226 instance hours $157.82 Compared to the previous month: 2944 GB $353.31 2221 instance hours $111.06 https://globe.torproject.org/#/bridge/88F745840F47CE0C6A4FE61D827950B06F9E4… == Amazon a.k.a. meek-amazon == meek-amazon was pretty much the same as last month. I am thinking about disabling the public Amazon backend, just because it's a bit more expensive and I haven't been able to get free credits for it. And when I say "disabling," I mean just removing it as a default option in Tor Browser; it wouldn't just stop working immediately. My idea is that we could publish a guide on setting up your own Amazon CloudFront instance, if CloudFront is a backend that works for you. Asia Pacific (Singapore) 115M requests $138.07 1018 GB $147.04 Asia Pacific (Sydney) 76K requests $0.10 1 GB $0.06 Asia Pacific (Tokyo) 29M requests $34.26 193 GB $24.99 EU (Ireland) 111M requests $133.22 868 GB $68.23 South America (Sao Paulo) 2M requests $4.25 11 GB $2.61 US East (Northern Virginia) 30M requests $29.90 278 GB $22.12 -- total 363M requests $339.80 2369 GB $265.05 https://globe.torproject.org/#/bridge/3FD131B74D9A96190B1EE5D31E91757FADA1A… == Azure a.k.a. meek-azure == I haven't been able to get estimated costs out of Azure. I didn't get a reply to my request last month to be added to whatever plan lets you see usage data. I did, however, recover the bandwidth history for the meek-azure bridge from Onionoo, so we can estimate what the cost would be. If we estimate a traffic mix similar to that of meek-amazon, with 40% coming from North America and Europe, and 60% coming from elsewhere, then the costs would be: https://onionoo.torproject.org/bandwidth?fingerprint=AA033EEB61601B2B7312D8… 2014-09 47 GB $5.53 2014-10 298 GB $35.04 2014-11 500 GB $58.80 2014-12 512 GB $60.21 2015-01 638 GB $75.03 2015-02 614 GB $72.21 https://globe.torproject.org/#/bridge/AA033EEB61601B2B7312D89B62AAA23DC3ED8… David Fifield

1 0

Two TOR questions
by John Lee 13 Mar '15

13 Mar '15

4 3

Renaming arm
by Kenneth Freeman 12 Mar '15

12 Mar '15

Surely Everyone Tors Here Although I think Argus a better descriptive name. Good luck backronyming THAT.

4 6

Preliminary Debian packages for meek
by Ximin Luo 11 Mar '15

11 Mar '15

Source package: https://mentors.debian.net/package/meek Binary packages: https://people.torproject.org/~infinity0/bin/ From the binaries, one should install both meek-client and xul-ext-meek-http-helper; the latter depends on xfvb and xauth. meek-client has been modified slightly to run meek-http-helper in a headless firefox using Xfvb(1), so that it works even when run as a system service. Example torrc: ~~~~ UseBridges 1 Bridge meek 0.0.2.0:1 url=https://meek-reflect.appspot.com/ front=www.google.com ClientTransportPlugin meek exec /usr/bin/meek-client-wrapper --log /var/log/tor/meek-client-wrapper.log --helper /usr/bin/meek-browser-helper -- /usr/bin/meek-client --log /var/log/tor/meek-client.log ~~~~ Also, to have meek-client-wrapper clean up child processes correctly one needs to apply the patch mentioned here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779608#20 After running `service tor restart`, you'll end up with something like this: $ sudo ls -1a /var/lib/tor/ . .. .cache cached-certs cached-descriptors cached-descriptors.new cached-microdesc-consensus cached-microdescs cached-microdescs.new .dbus Desktop .gconf .gnome2 .gnome2_private .lesshst lock .mozilla state This is not ideal but I don't know of a good way around it - suggestions welcome. X -- GPG: 4096R/1318EFAC5FBBDBCE git://github.com/infinity0/pubkeys.git

2 2

Update on Roadmaps for Core Tor, Hidden Services and Tor Obfuscation
by Isabela 11 Mar '15

11 Mar '15

Hello everyone. In Valencia dev meeting we did an exercise to help create a roadmap (1yr) for the different Tor projects and organizational work. I've collected those roadmaps and published them here: https://trac.torproject.org/projects/tor/wiki/org/roadmaps (I will be sending a similar email to the other dev and org teams as well. ) What I would like to ask from the folks in this list, and who collaborate with*Core Tor, Hidden Services and Tor Obfuscation (PT, BridgeDB etc), *is to look at the tables for March and April and fill up the empty cells for: * sponsor -> is the task related to a sponsor? (or could be) * who -> person(s) who the task should be assigned to * tickets -> ticket(s) related to the task (please create them if appropriate) * date -> estimation dates are always good :) nickm has already created tickets for Core Tor tasks (Thanks Nick!) Let me know if you have questions. Here are the tables I am talking about: * March and April for Core Tor: https://trac.torproject.org/projects/tor/wiki/org/roadmaps/CoreTor * March and April for Hidden Services: https://trac.torproject.org/projects/tor/wiki/org/roadmaps/HS * March and April for Tor Obfuscation: https://trac.torproject.org/projects/tor/wiki/org/roadmaps/TorObfuscation thanks everyone, Isabela

1 0

Suggestions for Hidden Services Statistics Project
by Gautham Nekkanti 11 Mar '15

11 Mar '15

Hi, I have already shared my project idea on a Simple analytics tool for Hidden service providers. I am looking for suggestions and input from users who have had experience in hosting hidden services. Obviously, there would all useful traffic related statistics. I'm looking for suggestions on what tor-specific statistics do you think would be helpful? As meejah suggested, we are thinking on including a few tor-specific statistics like 'How many times their onion descriptor was requested', 'No of bytes transferred through circuits', e.t.c. Thanks

2 1

Two protocols to measure relay-sensitive hidden-service statistics
by A. Johnson 10 Mar '15

10 Mar '15

Hello tor-dev, While helping design ways to publish statistics about hidden services in a privacy-preserving manner, it has become clear to me that certain statistics cannot be safely reported using the current method of having each relay collect and report measurements. I am going to describe a couple of simple protocols to handle this problem that I think should be implementable without much effort. I'd be happy to get feedback in particular about the security or ease-of-implementation of these protocols. Two HS statistics that we (i.e. people working on Sponsor R) are interested in collecting are: 1. The number of descriptor fetches received by a hidden-service directory (HSDir) 2. The number of client introduction requests at an introduction points (IPs) The privacy issue with #1 is that the set of HSDirs is (likely) unique to an HS, and so the number of descriptor fetches at its HSDirs could reveal the number of clients it had during a measurement period. Similarly, the privacy issue with #2 is that the set of IPs are (likely) unique to an HS, and so the number of client introductions at its IPs could reveal the number of client connections it received. A approach to solve this problem would be to anonymize the reported statistics. Doing so raises a couple of challenges, however: 1. Anonymous statistics should be authenticated as coming from some relay. Otherwise, statistics could be polluted by any malicious actor. 2. Statistical inference should be made robust to outliers. Without the relay identities, it will be difficult to detect and remove values that are incorrect, whether due to faulty measurement or malicious action by a relay. I propose some simple cryptographic techniques to privately collect the above statistics while handling the above challenges. I assume that there exists a set of Statistics Authorities (StatAuths), of which at least one must be honest for the protocol to be secure, but all of which can be "curious" (that is, we want to maintain privacy from them as well and allow their action to be completely public). The Directory Authorities could serve as StatAuths. A single server could as well, if you trust it to be honest- but-curious. If you have a trusted non-curious server, things become much simpler, of course: just have relays report their values to the server and then have it publish a global aggregate only. The AnonStats1 protocol to privately publish both statistics if we trust relays not to pollute the statistics (i.e. #2 is not a problem) is as follows: 1. Each StatAuth provides 2k partially-blind signatures on authentication tokens to each relay in a consensus during the measurement period. The blind part of a signed token is simply a random number chosen by the relay. The non-blind part of a token consists of the start time of the current measurement period. The 2k tokens will allow the relay to submit k values to the StatAuths. Note that we could avoid using partially-blind signatures by changing keys at the StatAuths every measurement period and then simply providing blind signatures on random numbers. 2. At the end of the measurement period, for each statistic, each relay uploads the following each on its own Tor circuit and accompanied by a unique token from each StatAuth: 1. The count 2. The ``statistical weight'' of the relay (1/(# HSDirs) for statistic #1 and the probability of selection as an IP for statistic #2) 3. The StatAuths verify that each uploaded value is accompanied by a unique token from each StatAuth that is valid for the current measurement period. To infer the global statistic from the anonymous per-relay statistic, the StatAuths add the counts, add the weights, and divide the former by the latter. The AnonStats1 protocol is vulnerable to a relay that publishes manipulated statistics (i.e. challenge #2). If this is a concern, the AnonStats2 protocol mitigates it by using a median statistic intead of a sum: 1. Relays are divided into b bins by consensus weight. The bins have exponentially-increasing length, and the rate of increase c is set such that the ith bin by increasing weights has at least r relays each of weight at least some minimum min_weight (say, r=500, min_weight=100). The first bin has weights in [0, min_weight), and the ith bin has weights in [min_weight*c^(i-2), min_weight*c^(i-1)). 2. Each StatAuth provides k partially-blind signatures on authentication tokens to each relay in a consensus during the measurement period. The blind part of a signed token is simply a random number chosen by the relay. The non-blind part of a token consists of the start time of the current measurement period and the bin containing the relay's consensus weight. 3. At the end of the measurement period, for each statistic, each relay divides each statistic by the relay's ``statistical weight'' (1/(# HSDirs) for statistic #1 and the probability of selection as an IP for statistic #2). The result is the relay's estimate of the global value of the statistic. The relay then uploads this value via a new Tor circuit, accompanied by a unique token from each StatAuth. 4. The StatAuths verify that each uploaded value is accompanied by a unique token from each StatAuth that is valid for the current measurement period and that contains the same relay-weight bin. To infer a final global statistic from the anonymous per-relay estimates, the StatAuths use the weighted median of the received estimates, where the weight of an estimates is taken to be the smallest value of its accompanying bin (i.e. the bin's left edge).

3 9

Using Traceroute for AS-Path prediction
by Simon Koch 10 Mar '15

10 Mar '15

I am a student at the saarland university and currently workin on my bachelor thesis concerning AS-path prediction using traceroute. I want to correlate open-source BGP data and corresponding traceroute measurements. In the end I want to argue whether or not traceroute can be valid tool for live as-path prediction based on the matching and representation of changes in the respective (AS-)routes over time. I already did a preliminary measurement during which I gathered 2 month worth of traceroute information to different tor-nodes and correlated them with the BGP data available at RIPE-RIS. It turned out, that routes predominantly matched and that a high percentage of bgp route changes were also present in traceroute routes. Though only a smaller number of traceroute route changes were present in bgp routes. This leads to the assumption that traceroute might be a good as-path measurement-tool, as it would be quite unlikely that BGP AS Paths would match AS paths derived from traceroute measurements if traceroute could not reliable measure them. This information was only inferred from only one measurement point, though. As this topic may also be of interrest to tor I was wondering if anyone had some opionions on the general idea or knew some papers/researcher which are on the same track. I already read: * Towards an Accurate AS-Level Traceroute Tool * Quantifying the Pitfalls of Traceroute in AS Connectivity * Inferring AS-level Internet Topology from Router-Level Path Traces * A Study on Traceroute Potential in Revealing the Internet AS-Level Topology * Mixing Biases: Structure Changes in the AS Topology Evolution but I have the nagging feeling that there should be more. Further I am looking for a way to establish more measurement points to ensure a diversity in the data collection. Those points have to be in AS-peers of either the route-views project (zebra routers) or RIPE-RIS so I can also access the corresponding MRT-Data. I figured that traceroute.org Looking Glass may provide a way of doing so, but I am afraid that the overlap of Looking Glass servers and public BGP-peers is very small. Any idea on how to establish diverse measurementpoints (maybe volunteers) would be greatly appreciated. Regards, Simon

3 2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

tor-dev March 2015