Have been running a guard for a couple of months with 'CellStatistics' and noticed that the distribution looks out of whack:
cell-stats-end 2013-12-20 18:13:10 (86400 s) cell-processed-cells 1409,9,6,6,6,5,4,3,2,1 cell-queued-cells 0.44,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00 cell-time-in-queue 98,1,1,1,0,13,2,1,1,0 cell-circuits-per-decile 15199
Seems like most of the circuits with significant traffic end up in the first bucket and the remaining nine buckets are of little significance. I'm fairly certain that a relative handful of circuits account for 99.9% of the cell traffic with cell-counts in the tens-to-hundreds of thousands. Most of that bot traffic I suppose.
Perhaps a log-scaled "loudness" breakdown would make sense?
Nothing pressing here, just an observation and a thought.
On 12/21/13 7:33 PM, starlight@binnacle.cx wrote:
Have been running a guard for a couple of months with 'CellStatistics' and noticed that the distribution looks out of whack:
cell-stats-end 2013-12-20 18:13:10 (86400 s) cell-processed-cells 1409,9,6,6,6,5,4,3,2,1 cell-queued-cells 0.44,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00 cell-time-in-queue 98,1,1,1,0,13,2,1,1,0 cell-circuits-per-decile 15199
Seems like most of the circuits with significant traffic end up in the first bucket and the remaining nine buckets are of little significance. I'm fairly certain that a relative handful of circuits account for 99.9% of the cell traffic with cell-counts in the tens-to-hundreds of thousands. Most of that bot traffic I suppose.
Perhaps a log-scaled "loudness" breakdown would make sense?
Nothing pressing here, just an observation and a thought.
Hi,
you're right that a log-scaled breakdown would be more meaningful than simple deciles. A possible downside is that the first bucket would be pretty small, especially on slow relays. Not saying that we shouldn't do it, but we have to be careful not to provide statistics on a too small set of observations. This requires analysis and experimenting.
One way to do the experimenting part is to run a large *private* Tor network in Shadow, enable the new CELL_STATS event type, and aggregate new cell statistics from logged events. The next step would be to write a proposal, discuss it on this list, write a patch, and get it reviewed and merged into master.
If we spend all this effort, we should look into other changes to cell statistics we want to make. There's already a minor flaw in cell statistics noted in dir-spec.txt: "Note that this statistic can be inaccurate for circuits that had queued cells at the start or end of the measurement interval." If we touch cell statistics, we should fix that issue, too.
More generally, we should step back and see what questions we want to have answered by cell statistics. And then we should design the statistics that we need to answer those questions.
Want to help?
All the best, Karsten