Hi,
I'm trying to figure out why a list from [TorBulkExitList.py] is so much larger than what is seen in [exit-addresses].
Point in case: Earlier today the list from TorBulkExitList.py contained 58% more addresses than exit-addresses:
--8<---------------cut here---------------start------------->8--- $ curl-tor -q 'https://check.torproject.org/cgi-bin/TorBulkExitList.py?ip=8.8.8.8' | egrep -v ^# | wc -l 1554 $ curl-tor -q https://check.torproject.org/exit-addresses | egrep ^ExitNode | wc -l 985 --8<---------------cut here---------------end--------------->8---
Counting relays with the Exit flag in a consensus of roughly the same time gives 912:
--8<---------------cut here---------------start------------->8--- $ curl-tor -q http://171.25.193.9:443/tor/status-vote/current/consensus | egrep '^s.*Exit' | wc -l 912 --8<---------------cut here---------------end--------------->8---
If I read the code [check.py][exitips.py] correctly, we include the following IP addresses in what is served from [TorBulkExitList.py]:
- all routers in the current consensus - with exit addresses from TorDNSEL file(s) added - for which stem's exit_policy.is_exiting_allowed() returns true
I haven't read any TorDNSEL code and don't know exactly what above mentioned TorDNSEL file(s) are but I think they include results from active testing of what address exit relays really use for exiting.
[exit-addresses] seems to be from TorDNSEL too but holding more of a snapshot while the files mentioned above supposedly cover some period of time.
Before I dig further, perhaps someone here already know why the numbers differ so? I wouldn't expect the churn of addresses used for exiting to be high enough to explain the difference.
[TorBulkExitList.py] https://check.torproject.org/cgi-bin/TorBulkExitList.py?ip=8.8.8.8 [exit-addresses] https://check.torproject.org/exit-addresses [check.go] https://gitweb.torproject.org/check.git/tree/check.go [exitips.py] https://gitweb.torproject.org/check.git/tree/scripts/exitips.py
Thanks, Linus
On Mar 17, 2016, at 3:45 AM, Linus Nordberg linus@torproject.org wrote:
Hi,
I'm trying to figure out why a list from [TorBulkExitList.py] is so much larger than what is seen in [exit-addresses].
Let's certainly not rule out the possibility of a bug.
Point in case: Earlier today the list from TorBulkExitList.py contained 58% more addresses than exit-addresses:
--8<---------------cut here---------------start------------->8--- $ curl-tor -q 'https://check.torproject.org/cgi-bin/TorBulkExitList.py?ip=8.8.8.8' | egrep -v ^# | wc -l 1554 $ curl-tor -q https://check.torproject.org/exit-addresses | egrep ^ExitNode | wc -l 985
Shouldn't this grep `ExitAddress`? (which only adds 3 right now)
--8<---------------cut here---------------end--------------->8---
Counting relays with the Exit flag in a consensus of roughly the same time gives 912:
--8<---------------cut here---------------start------------->8--- $ curl-tor -q http://171.25.193.9:443/tor/status-vote/current/consensus | egrep '^s.*Exit' | wc -l 912 --8<---------------cut here---------------end--------------->8---
If I read the code [check.py][exitips.py] correctly, we include the following IP addresses in what is served from [TorBulkExitList.py]:
- all routers in the current consensus
- with exit addresses from TorDNSEL file(s) added
- for which stem's exit_policy.is_exiting_allowed() returns true
It's more like,
- all routers in the past 16 consensuses - for which stem's exit_policy.is_exiting_allowed() returns true
but it also includes all ips it has ever known for a relay, https://gitweb.torproject.org/check.git/commit/?id=026e43b08656d78398b15742d... which seems a little suspect.
I haven't read any TorDNSEL code and don't know exactly what above mentioned TorDNSEL file(s) are but I think they include results from active testing of what address exit relays really use for exiting.
Yup, that's it.
[exit-addresses] seems to be from TorDNSEL too but holding more of a snapshot while the files mentioned above supposedly cover some period of time.
Yes.
Before I dig further, perhaps someone here already know why the numbers differ so? I wouldn't expect the churn of addresses used for exiting to be high enough to explain the difference.
If the numbers don't add up, can you file a bug?