On 5/13/13 9:38 AM, Roger Dingledine wrote:
On Mon, May 13, 2013 at 08:58:27AM +0200, Karsten Loesing wrote:
The only downside I can see is that it takes about 30--45 minutes for new exits to show up in your local cache. An alternative would be to query the exit list yourself, download the most recent consensus, and compile a list of exit addresses yourself.
Speaking of delays: the place that knows about new relays first is each directory authority. Not only that, but they know also what IP address the relay is exiting from, since that's where the relay publishes its descriptor from. E.g.,
@uploaded-at 2013-05-12 16:57:35 @source "173.246.102.12" router nthdimension 173.246.101.241 443 0 0 [...]
Seems like this info could provide an alternative, simpler way to generate the exit-addresses file: https://exitlist.torproject.org/exit-addresses
which if we're doing our modularity right, should be the input to the various other scripts.
Interesting. Haven't thought of using that information. metrics-db even has this information available from gabelmoo, because it rsyncs gabelmoo's cached-* files (and v3-status-votes) once per hour. But metrics-db discards all descriptor annotations so far.
However, I don't think this information can replace the information we learn from TorDNSEL or TorBEL. Some concerns:
- Relays may exit from more than just one IP address, but the directory authorities would only see at most one of these addresses. Here's an exit list entry with two exit IP addresses:
ExitNode 49A75EE0B80C1963482FDDFCE579D1A0C568D8BB Published 2013-05-12 20:59:32 LastStatus 2013-05-12 22:02:59 ExitAddress 46.165.221.166 2013-05-12 22:03:11 ExitAddress 46.166.163.169 2013-05-12 22:03:11
- The directory authorities sometimes download descriptors they don't have from other directory authorities. In that case we don't learn the IP address that the relay exits from. Here's an example:
@downloaded-at 2013-05-12 18:50:10 @source "154.35.32.5"
- The directory authorities are indeed the first to learn these source IP addresses. But we probably don't want arbitrary services to query the authorities frequently for their cached descriptors to learn their annotations. That means we'd have to aggregate and cache this information at another place, which introduces a delay.
I guess I should make a trac ticket of this idea. But which component? We sure seem to have a lot of projects that overlap tordnsel / torbel in some way.
For now, I'd say it's an "Analysis" ticket, because we don't yet know how to use this information. If you want to make a ticket, I'll paste my concerns above there.
And you're right that Onionoo overlaps with TorDNSEL/TorBEL to a certain extent. Or rather, it uses their data and presents them in a more convenient way. This wasn't planned, and it would be better if TorDNSEL/TorBEL had a more convenient interface that people could use instead. Until that's the case, people can easily use Onionoo.
Best, Karsten