-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Hi,
if I ask two onionoo instances running the same version (as given in the header of the document) for the same details document, to what extend should they match if relays_published matches, sorting is been taken care of and lets assume that both instances use the same maxmind files and their DNS servers provide the same answers?
One example: Torprojects instance says: "flags":[] cthulhu's instance says: "flags":[""]
thanks, nusenu
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
other examples: different views on when last_changed_address_or_port actually happened:
"A4877053906D36F47F0E610DC56E95601123C02A","last_changed_address_or_port ":"2015-04-13 14:00:00" vs. "A4877053906D36F47F0E610DC56E95601123C02A","last_changed_address_or_port ":"2015-04-11 12:00:00"
diff on that field:
< 259D44BDF3734077902CD71606BAD95F994A606B"2015-04-13 08:00:00 - ---
259D44BDF3734077902CD71606BAD95F994A606B"2015-04-11 12:00:00
< 3737F4542BBA0C43345BCD91C4F1E194418B313F"2015-02-14 12:00:00 - ---
3737F4542BBA0C43345BCD91C4F1E194418B313F"2015-02-15 12:00:00
< 9F938AE96C6B63F726BB885E4F2D1319C84A25BB"2015-04-12 14:00:00 - ---
9F938AE96C6B63F726BB885E4F2D1319C84A25BB"2015-04-11 12:00:00
< 4E8CE6F5651E7342C1E7E5ED031E82078134FB0D"2015-01-28 11:00:00 - ---
4E8CE6F5651E7342C1E7E5ED031E82078134FB0D"2015-01-26 03:00:00
< 73AB1555F0DA2E6D6B2AB2A603A8CB34F2981B3D"2014-12-30 20:00:00 - ---
73AB1555F0DA2E6D6B2AB2A603A8CB34F2981B3D"2014-12-30 13:00:00
...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 22/04/15 20:40, nusenu wrote:
if I ask two onionoo instances running the same version (as given in the header of the document) for the same details document, to what extend should they match if relays_published matches, sorting is been taken care of and lets assume that both instances use the same maxmind files and their DNS servers provide the same answers?
One example: Torprojects instance says: "flags":[] cthulhu's instance says: "flags":[""]
Interesting. These might either be bugs, or one instance was missing a descriptor or consensus that the other instance was processing. In any case, let's try to make outputs as similar as possible.
Mind opening tickets for these two issues (the one above and the one you posted to this list afterwards) and provide more details if available?
Thanks!
All the best, Karsten
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Karsten Loesing:
These might either be bugs, or one instance was missing a descriptor or consensus that the other instance was processing.
I assume the instances do not process the same descriptors in every case. That would explain most of the differences.
If their 'relays_published' timestamp match, they processed the same consensus, correct?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 25/04/15 12:44, nusenu wrote:
Karsten Loesing:
These might either be bugs, or one instance was missing a descriptor or consensus that the other instance was processing.
I assume the instances do not process the same descriptors in every case. That would explain most of the differences.
The two instances fetch descriptors from collector.torproject.org at :15 and :18 every hour, respectively. It might be that descriptors are written in the three minutes in between, though it's rather unlikely. But yes, it would explain some differences.
If their 'relays_published' timestamp match, they processed the same consensus, correct?
That timestamp is updated as last step of the hourly update process. The details documents that you're fetching may have been updated before. That would also explain some differences.
Thanks for opening tickets for the issues you're finding. I'm going through them on Trac as time permits.
All the best, Karsten
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
If their 'relays_published' timestamp match, they processed the
same consensus, correct?
That timestamp is updated as last step of the hourly update process. The details documents that you're fetching may have been updated before. That would also explain some differences.
Wouldn't it make sense to mention/use the timestamp of the consensus that has been used to generate the output instead then?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 25/04/15 13:44, nusenu wrote:
If their 'relays_published' timestamp match, they processed the
same consensus, correct?
That timestamp is updated as last step of the hourly update process. The details documents that you're fetching may have been updated before. That would also explain some differences.
Wouldn't it make sense to mention/use the timestamp of the consensus that has been used to generate the output instead then?
It *is* the timestamp of the consensus that has been used to generate the output. But generating the documents that go into the output is not an atomic step. It's an hourly cronjob that runs for 15--30 minutes and writes documents for all relays to disk, and only after that is done, the relays-published timestamp is updated on disk. As I'm saying on one of the tickets, one way to change this would be to use a database and update all documents in a single transaction, but that's a major change to the current design. Which doesn't mean we shouldn't do it, but it's not trivial, and maybe it's not the most pressing missing feature.
Want to run your own instance of Onionoo and help develop it? If you need a host for that, we might be able to solve that somehow.
All the best, Karsten
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Karsten Loesing:
That timestamp is updated as last step of the hourly update
process. The details documents that you're fetching may have been updated before. That would also explain some differences.
Wouldn't it make sense to mention/use the timestamp of the consensus that has been used to generate the output instead then?
It *is* the timestamp of the consensus that has been used to generate the output. But generating the documents that go into the output is not an atomic step. It's an hourly cronjob that runs for 15--30 minutes and writes documents for all relays to disk, and only after that is done, the relays-published timestamp is updated on disk. As I'm saying on one of the tickets, one way to change this would be to use a database and update all documents in a single transaction, but that's a major change to the current design. Which doesn't mean we shouldn't do it, but it's not trivial, and maybe it's not the most pressing missing feature.
Would it be a quick and dirty fix to state the timestamp for every record separately (to become "more atomic") or does that not fix anything/is not possible? (I have no onionoo insides) "fix" in terms of: a record with a given timestamp should not change over time and multiple instances would provide the same record for a given timestamp.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 25/04/15 19:58, nusenu wrote:
Karsten Loesing:
That timestamp is updated as last step of the hourly update
process. The details documents that you're fetching may have been updated before. That would also explain some differences.
Wouldn't it make sense to mention/use the timestamp of the consensus that has been used to generate the output instead then?
It *is* the timestamp of the consensus that has been used to generate the output. But generating the documents that go into the output is not an atomic step. It's an hourly cronjob that runs for 15--30 minutes and writes documents for all relays to disk, and only after that is done, the relays-published timestamp is updated on disk. As I'm saying on one of the tickets, one way to change this would be to use a database and update all documents in a single transaction, but that's a major change to the current design. Which doesn't mean we shouldn't do it, but it's not trivial, and maybe it's not the most pressing missing feature.
Would it be a quick and dirty fix to state the timestamp for every record separately (to become "more atomic") or does that not fix anything/is not possible? (I have no onionoo insides) "fix" in terms of: a record with a given timestamp should not change over time and multiple instances would provide the same record for a given timestamp.
You mean instead of:
{"version":"2.3", "relays_published":"2015-04-25 18:00:00", "relays":[ {"n":"shadowmourne","f":"1F515F1D420B498D9687658F4A3D176F88DD4910","a":["91.219.236.218","80.255.11.213"],"r":true}, {"n":"StinTheHuman","f":"7C05C5D24577CA4C3A904470AE5526C32290FCF0","a":["108.216.89.93"],"r":true} ], "bridges_published":"2015-04-25 17:52:43", "bridges":[ ]}
something like this (note the "u" field):
{"version":"2.3", "relays_published":"2015-04-25 18:00:00", "relays":[ {"n":"shadowmourne","f":"1F515F1D420B498D9687658F4A3D176F88DD4910","a":["91.219.236.218","80.255.11.213"],"r":true,"u":"2014-05-25 18:39:15"}, {"n":"StinTheHuman","f":"7C05C5D24577CA4C3A904470AE5526C32290FCF0","a":["108.216.89.93"],"r":true,"u":"2014-05-25 18:32:50"} ], "bridges_published":"2015-04-25 17:52:43", "bridges":[ ]}
Yes, that would be possible, but it would not fix the `If-Modified-Since` problem.
However, I this is probably just a minor problem for most users. It's still a bug, but nothing too serious. (Your use case of finding differences between two Onionoo servers is probably the exception there.)
All the best, Karsten
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
You mean instead of:
{"version":"2.3", "relays_published":"2015-04-25 18:00:00", "relays":[ {"n":"shadowmourne","f":"1F515F1D420B498D9687658F4A3D176F88DD4910","a"
:["91.219.236.218","80.255.11.213"],"r":true},
{"n":"StinTheHuman","f":"7C05C5D24577CA4C3A904470AE5526C32290FCF0","a":[ "108.216.89.93"],"r":true}
], "bridges_published":"2015-04-25 17:52:43", "bridges":[ ]}
something like this (note the "u" field):
{"version":"2.3", "relays_published":"2015-04-25 18:00:00", "relays":[ {"n":"shadowmourne","f":"1F515F1D420B498D9687658F4A3D176F88DD4910","a"
:["91.219.236.218","80.255.11.213"],"r":true,"u":"2014-05-25
18:39:15"},
{"n":"StinTheHuman","f":"7C05C5D24577CA4C3A904470AE5526C32290FCF0","a"
:["108.216.89.93"],"r":true,"u":"2014-05-25
18:32:50"}
], "bridges_published":"2015-04-25 17:52:43", "bridges":[ ]}
Yes, one timestamp per record, but I'm not entirely sure what the timestamps here represent. I expected more something like 2015-04-25 XX:00:00 (one hour granularity for consensus timestamps), but not necessarily matching the one in the relays_published entry.
However, I this is probably just a minor problem for most users. It's still a bug, but nothing too serious. (Your use case of finding differences between two Onionoo servers is probably the exception there.)
My use case will be "feed details.json into a db once an hour and use relays_published as the 'consensus timestamp' from which this data comes from". But the assumption 'this comes from consensus X' is not valid in all cases as we know. So I would use the per-record-timestamp to find out on what consensus "I'm currently looking at" (in case something like this will be added).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 25/04/15 22:37, nusenu wrote:
You mean instead of:
{"version":"2.3", "relays_published":"2015-04-25 18:00:00", "relays":[ {"n":"shadowmourne","f":"1F515F1D420B498D9687658F4A3D176F88DD4910","a"
:["91.219.236.218","80.255.11.213"],"r":true},
{"n":"StinTheHuman","f":"7C05C5D24577CA4C3A904470AE5526C32290FCF0","a":[
"108.216.89.93"],"r":true}
], "bridges_published":"2015-04-25 17:52:43", "bridges":[ ]}
something like this (note the "u" field):
{"version":"2.3", "relays_published":"2015-04-25 18:00:00", "relays":[ {"n":"shadowmourne","f":"1F515F1D420B498D9687658F4A3D176F88DD4910","a"
:["91.219.236.218","80.255.11.213"],"r":true,"u":"2014-05-25
18:39:15"},
{"n":"StinTheHuman","f":"7C05C5D24577CA4C3A904470AE5526C32290FCF0","a"
:["108.216.89.93"],"r":true,"u":"2014-05-25
18:32:50"}
], "bridges_published":"2015-04-25 17:52:43", "bridges":[ ]}
Yes, one timestamp per record, but I'm not entirely sure what the timestamps here represent. I expected more something like 2015-04-25 XX:00:00 (one hour granularity for consensus timestamps), but not necessarily matching the one in the relays_published entry.
However, I this is probably just a minor problem for most users. It's still a bug, but nothing too serious. (Your use case of finding differences between two Onionoo servers is probably the exception there.)
My use case will be "feed details.json into a db once an hour and use relays_published as the 'consensus timestamp' from which this data comes from". But the assumption 'this comes from consensus X' is not valid in all cases as we know. So I would use the per-record-timestamp to find out on what consensus "I'm currently looking at" (in case something like this will be added).
https://trac.torproject.org/projects/tor/ticket/15848
All the best, Karsten