Hi everyone,
some of you may already know our new approach to estimating daily Tor users:
https://metrics.torproject.org/users.html#userstats
This new approach is in beta since April, and I'm quite happy with it. I trust the new numbers more than the old ones, both for direct users and bridge users. The new code for direct users is quite similar to the old one, but much cleaner. The approach for bridge users is a much better idea than the old hack. Today I added the missing features like the top-10 lists and the censorship detector.
Why do I tell you this?
Because the old approach uses resources on our poor, already overloaded metrics machine, and I'm planning to shut down the old approach in the very near future. Here's the plan:
- Compute user numbers for 2012 and before; the current numbers start on January 1, 2013. This is going to take at least until September 23.
- Take out the "BETA" labels and throw out everything above "New approach to estimating daily Tor users (BETA)". This could happen on October 1.
Thoughts? Did I miss anything that's worth keeping? Anyone want to create an archive of their favorite graphs before I pull the plug?
All the best, Karsten
I would actually really appreciate the old numbers (from ~2007-8/2013) being kept online. Estimating growth over time and mapping spikes is kind of a big deal to me. =)
~Griffin
On 09/16/2013 02:28 PM, Karsten Loesing wrote:
Hi everyone,
some of you may already know our new approach to estimating daily Tor users:
https://metrics.torproject.org/users.html#userstats
This new approach is in beta since April, and I'm quite happy with it. I trust the new numbers more than the old ones, both for direct users and bridge users. The new code for direct users is quite similar to the old one, but much cleaner. The approach for bridge users is a much better idea than the old hack. Today I added the missing features like the top-10 lists and the censorship detector.
Why do I tell you this?
Because the old approach uses resources on our poor, already overloaded metrics machine, and I'm planning to shut down the old approach in the very near future. Here's the plan:
- Compute user numbers for 2012 and before; the current numbers start
on January 1, 2013. This is going to take at least until September 23.
- Take out the "BETA" labels and throw out everything above "New
approach to estimating daily Tor users (BETA)". This could happen on October 1.
Thoughts? Did I miss anything that's worth keeping? Anyone want to create an archive of their favorite graphs before I pull the plug?
All the best, Karsten _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 9/16/13 8:36 PM, Griffin Boyce wrote:
I would actually really appreciate the old numbers (from ~2007-8/2013) being kept online. Estimating growth over time and mapping spikes is kind of a big deal to me. =)
I see. How about I put the CSV files for direct and bridge users, estimated with the old approach and ending on September 30, 2013, on https://metrics.torproject.org/data.html, together with a short description?
Also, the new graphs will reach back to about September 2010 once I'm done crunching numbers.
All the best, Karsten
On Mon, Sep 16, 2013 at 08:28:21PM +0200, Karsten Loesing wrote:
Hi everyone,
some of you may already know our new approach to estimating daily Tor users:
https://metrics.torproject.org/users.html#userstats
This new approach is in beta since April, and I'm quite happy with it. I trust the new numbers more than the old ones, both for direct users and bridge users. The new code for direct users is quite similar to the old one, but much cleaner. The approach for bridge users is a much better idea than the old hack. Today I added the missing features like the top-10 lists and the censorship detector.
Why do I tell you this?
Because the old approach uses resources on our poor, already overloaded metrics machine, and I'm planning to shut down the old approach in the very near future. Here's the plan:
- Compute user numbers for 2012 and before; the current numbers start
on January 1, 2013. This is going to take at least until September 23.
- Take out the "BETA" labels and throw out everything above "New
approach to estimating daily Tor users (BETA)". This could happen on October 1.
Thoughts? Did I miss anything that's worth keeping? Anyone want to create an archive of their favorite graphs before I pull the plug?
All the best, Karsten
Hi Karsten,
Awesome work with these metrics. I am curious, however, why the new metrics are more correct than the old ones. The graphs are definitely smoother and including the requests to the Dir Auths seems correct, but are there other specific reasons for this choice?
Thanks for your work on this! Matt
On 9/16/13 10:55 PM, Matthew Finkel wrote:
Awesome work with these metrics. I am curious, however, why the new metrics are more correct than the old ones. The graphs are definitely smoother and including the requests to the Dir Auths seems correct, but are there other specific reasons for this choice?
The direct user numbers in the new approach are quite similar to the ones in the old approach. Including directory authorities is different, right. But the main reason for switching is that the code is much cleaner, because it's shared code with computing bridge user numbers. Oh, another reason is that results are much faster available than in the old approach. But the numbers aren't that different.
However, this is very different for bridge user numbers. The old approach counted total unique IP addresses seen at bridges, where uniqueness was limited to single bridges. The new approach is quite similar to how we estimate direct user numbers; therefore the shared code.
I'm not sure how much detail to give here. Maybe the tech report [0] abstract explains this in some more detail:
""" As part of the Tor Metrics Project, we want to learn how many people use the Tor network on a daily basis. Counting users in an anonymity network is, obviously, a difficult task for which we cannot collect too sensitive usage data. We came up with a privacy-preserving approach for estimating directly connecting user numbers by counting requests to the directory mirrors and deriving approximate user numbers from there. In this report we describe a modified approach for estimating the number of users connecting via bridges by evaluating directory requests made to bridges. We compare this new approach to our current approach that estimates bridge user numbers from total unique IP addresses seen at bridges. We think that results from the new approach are closer to reality, even though that means there are significantly fewer daily bridge users than originally expected. """
Let me know if there's something I should explain in more detail.
All the best, Karsten
[0] https://research.torproject.org/techreports/counting-daily-bridge-users-2012...
On Mon, 16 Sep 2013 20:28:21 +0200 Karsten Loesing karsten@torproject.org wrote:
Why do I tell you this?
Because the old approach uses resources on our poor, already overloaded metrics machine, and I'm planning to shut down the old approach in the very near future. Here's the plan:
- Compute user numbers for 2012 and before; the current numbers start
on January 1, 2013. This is going to take at least until September 23.
What's stopping us from computing user numbers back to the beginning of recorded data?
On 9/16/13 11:07 PM, Andrew Lewman wrote:
On Mon, 16 Sep 2013 20:28:21 +0200 Karsten Loesing karsten@torproject.org wrote:
Why do I tell you this?
Because the old approach uses resources on our poor, already overloaded metrics machine, and I'm planning to shut down the old approach in the very near future. Here's the plan:
- Compute user numbers for 2012 and before; the current numbers start
on January 1, 2013. This is going to take at least until September 23.
What's stopping us from computing user numbers back to the beginning of recorded data?
The new approach uses directory byte histories, so bytes used for answering directory requests, whereas the old approach used the general byte histories. The former is only available since about September 2010. That's how far we can go back in time to compute user numbers in the new approach. Well, maybe a few months later, because not enough relays/bridges reported these directory byte histories right from the start.
All the best, Karsten
On Mon, Sep 16, 2013 at 08:28:21PM +0200, Karsten Loesing wrote:
Here's the plan:
- Compute user numbers for 2012 and before; the current numbers start
on January 1, 2013. This is going to take at least until September 23.
Sounds good. That sounds like it will resolve Griffin's question too?
Thoughts? Did I miss anything that's worth keeping? Anyone want to create an archive of their favorite graphs before I pull the plug?
I think it would be good to write a paragraph or two to answer Matthew's question -- why are these new numbers different, and what makes us think they're better? I admit I've lost track of the various user counting approaches too.
(Also, is our estimate of August 2013 any more reliable than our estimate of August 2008, in this new approach? That is, are we losing more from the much earlier dates?)
Thanks! --Roger
On Mon, Sep 16, 2013 at 06:02:14PM -0400, Roger Dingledine wrote:
I think it would be good to write a paragraph or two to answer Matthew's question -- why are these new numbers different, and what makes us think they're better?
Speaking of which: https://metrics.torproject.org/users.html#direct-users keeps going up
while https://metrics.torproject.org/users.html#userstats looks like it's going down again
yet https://metrics.torproject.org/network.html#dirbytes looks a lot more like the first curve.
Does this change your mind any?
--Roger
On 9/17/13 8:38 AM, Roger Dingledine wrote:
On Mon, Sep 16, 2013 at 06:02:14PM -0400, Roger Dingledine wrote:
I think it would be good to write a paragraph or two to answer Matthew's question -- why are these new numbers different, and what makes us think they're better?
Speaking of which: https://metrics.torproject.org/users.html#direct-users keeps going up
while https://metrics.torproject.org/users.html#userstats looks like it's going down again
yet https://metrics.torproject.org/network.html#dirbytes looks a lot more like the first curve.
Does this change your mind any?
Really? Here are the three graphs just for September, all ending on the same day:
https://metrics.torproject.org/users.html?graph=direct-users&start=2013-...
https://metrics.torproject.org/users.html?graph=userstats-relay-country&...
https://metrics.torproject.org/network.html?graph=dirbytes&start=2013-09...
These graphs don't look much different to me.
All the best, Karsten
On Tue, Sep 17, 2013 at 11:53:15AM +0200, Karsten Loesing wrote:
Here are the three graphs just for September, all ending on the same day:
https://metrics.torproject.org/users.html?graph=direct-users&start=2013-...
https://metrics.torproject.org/users.html?graph=userstats-relay-country&...
https://metrics.torproject.org/network.html?graph=dirbytes&start=2013-09...
These graphs don't look much different to me.
Ah ha!
The middle one has an intermittent bug where (I assume) it plots a data point on the graph before it has all the numbers for that data point. So periodically it looks like the user counts are falling, but when it updates the graph later it no longer looks like that.
--Roger
On 9/17/13 6:25 PM, Roger Dingledine wrote:
On Tue, Sep 17, 2013 at 11:53:15AM +0200, Karsten Loesing wrote:
Here are the three graphs just for September, all ending on the same day:
https://metrics.torproject.org/users.html?graph=direct-users&start=2013-...
https://metrics.torproject.org/users.html?graph=userstats-relay-country&...
https://metrics.torproject.org/network.html?graph=dirbytes&start=2013-09...
These graphs don't look much different to me.
Ah ha!
The middle one has an intermittent bug where (I assume) it plots a data point on the graph before it has all the numbers for that data point. So periodically it looks like the user counts are falling, but when it updates the graph later it no longer looks like that.
Ah, how sad. I looked into this a few months back and the last data point didn't seem to change much, but maybe that was just coincidence on the days I looked. Sounds like we'll have to delay results by one more day. I'll take a closer look and change the graph if necessary.
Thanks!
All the best, Karsten
On 9/17/13 6:39 PM, Karsten Loesing wrote:
On 9/17/13 6:25 PM, Roger Dingledine wrote:
On Tue, Sep 17, 2013 at 11:53:15AM +0200, Karsten Loesing wrote:
Here are the three graphs just for September, all ending on the same day:
https://metrics.torproject.org/users.html?graph=direct-users&start=2013-...
https://metrics.torproject.org/users.html?graph=userstats-relay-country&...
https://metrics.torproject.org/network.html?graph=dirbytes&start=2013-09...
These graphs don't look much different to me.
Ah ha!
The middle one has an intermittent bug where (I assume) it plots a data point on the graph before it has all the numbers for that data point. So periodically it looks like the user counts are falling, but when it updates the graph later it no longer looks like that.
Ah, how sad. I looked into this a few months back and the last data point didn't seem to change much, but maybe that was just coincidence on the days I looked. Sounds like we'll have to delay results by one more day. I'll take a closer look and change the graph if necessary.
In fact, I found in #8462 that we should cut off the last two days, but then didn't implement that in metrics-web. Fixed.
All the best, Karsten
On 9/17/13 12:02 AM, Roger Dingledine wrote:
On Mon, Sep 16, 2013 at 08:28:21PM +0200, Karsten Loesing wrote:
Here's the plan:
- Compute user numbers for 2012 and before; the current numbers start
on January 1, 2013. This is going to take at least until September 23.
Sounds good. That sounds like it will resolve Griffin's question too?
Depending on how far in the past he needs the user numbers, yes. See also my other reply.
Thoughts? Did I miss anything that's worth keeping? Anyone want to create an archive of their favorite graphs before I pull the plug?
I think it would be good to write a paragraph or two to answer Matthew's question -- why are these new numbers different, and what makes us think they're better? I admit I've lost track of the various user counting approaches too.
I'm going to extend the paragraph on users.html that also contains the link to the tech report.
(Also, is our estimate of August 2013 any more reliable than our estimate of August 2008, in this new approach? That is, are we losing more from the much earlier dates?)
See my earlier replies: graphs will start at the end of 2010 or so. But the old numbers will still be available on data.html.
All the best, Karsten
On 9/17/13 12:02 AM, Roger Dingledine wrote:
I think it would be good to write a paragraph or two to answer Matthew's question -- why are these new numbers different, and what makes us think they're better? I admit I've lost track of the various user counting approaches too.
I thought more about this. If *you* lost track of the various user counting approaches, how shall the community understand what's going on?
I started a Q-and-A which I'm planning to add to the bottom of the metrics page when I retire the old approach:
https://trac.torproject.org/projects/tor/wiki/doc/MetricsUserStatsQAndA
Please, everyone, help make these questions and answers better. It's a wiki page, please add your thoughts!
Thanks!
All the best, Karsten
On 9/16/13 8:28 PM, Karsten Loesing wrote:
Hi everyone,
some of you may already know our new approach to estimating daily Tor users:
https://metrics.torproject.org/users.html#userstats
This new approach is in beta since April, and I'm quite happy with it. I trust the new numbers more than the old ones, both for direct users and bridge users. The new code for direct users is quite similar to the old one, but much cleaner. The approach for bridge users is a much better idea than the old hack. Today I added the missing features like the top-10 lists and the censorship detector.
Why do I tell you this?
Because the old approach uses resources on our poor, already overloaded metrics machine, and I'm planning to shut down the old approach in the very near future. Here's the plan:
- Compute user numbers for 2012 and before; the current numbers start
on January 1, 2013. This is going to take at least until September 23.
- Take out the "BETA" labels and throw out everything above "New
approach to estimating daily Tor users (BETA)". This could happen on October 1.
Thoughts? Did I miss anything that's worth keeping? Anyone want to create an archive of their favorite graphs before I pull the plug?
Pulled the plug. The users page now shows only the new graphs:
https://metrics.torproject.org/users.html
Scroll down for a few questions and answers explaining the approach.
The old numbers until September 2013 are still available and come with graphing code:
https://metrics.torproject.org/data/old-user-number-estimates.tar.gz
All the best, Karsten