tor-dev October 2015

tor-dev@lists.torproject.org

64 participants
82 discussions

Status of remaining SVN repositories
by Jens Kubieziel 27 Oct '15

27 Oct '15

Hi, Tor has a SVN with several repositories in it. The ticket #4929 deals with migrating them to git (<URL:https://trac.torproject.org/projects/tor/ticket/4929>). I made a table within the ticket to track the current status. Most of the repositories are in git right now. However some remain where I don't know what the current status is. This are: - blossom - bsockets - incognito - libevent-urz - topf - projects (it seems sub repos are in git) - website Should the first five repos also be moved to trac or what do we want to do with them? -- Jens Kubieziel http://www.kubieziel.de Eine schwarze Katze auf dem Weg zum Galgen bringt Unglück. Werner Mitsch

2 1

A layered transport
by Da Feng 26 Oct '15

26 Oct '15

Hi: I've discovered that the GFW normally doesn't block https protocols. We can use a https front tier to distribute connections to actual bridges. The front tier encrypts an internal address identifier with its private key (no matching public key or public algorithm) and returns to user the encrypted identifier, part of which also includes the user's chosen password. Then when submitting requests, the user encrypt again with his password the items such as his timestamp, broswer headers. The request line to https server is no different from an ordinary one and include both the user encrypted item and front tier encrypted item. After the connection is established, data is relayed inside https between bridge and user.

4 3

Onion Services and NAT Punching
by Tim Wilson-Brown - teor 26 Oct '15

26 Oct '15

Hi All, Do you know a use case which needs Single Onion Services and NAT punching? We’re wondering if there are mobile or desktop applications / services that would use a single onion service for the performance benefits, but still need NAT punching. (And don’t need the anonymity of a hidden service.) Single Onion Services: * can’t do NAT punching, (they need an ORPort on a publicly accessible IP address), * locations are easier to discover, and * have lower latency. Hidden Services: * can do NAT punching, * locations are hard to discover, and * have higher latency. Are there any use cases that: * need NAT punching, * don’t need service location anonymity, and * would benefit from lower latency? Thanks Tim Tim Wilson-Brown (teor) teor2345 at gmail dot com PGP 968F094B teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F

6 6

txtorcon 0.14.0
by meejah 25 Oct '15

25 Oct '15

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm happy to announce txtorcon 0.14.0. Changes: * IStreamAttacher handling was missing None and DO_NOT_ATTACH cases if a Deferred was returned. * add .is_built Deferred to txtorcon.Circuit that gets callback()'d when the circuit becomes BUILT * David Stainton ported his "tor:" endpoint parser so now both client and server endpoints are supported. This means **any** Twisted program using endpoints can use Tor as a client. For example, to connect to txtorcon's Web site: ep = clientFromString("tor:timaq4ygg2iegci7.onion:80"). (In the future, I'd like to automatically launch Tor if required, too). * Python3 fixes from Isis Lovecruft (note: needs Twisted 15.4.0+) You can download the release from PyPI or GitHub (or of course "pip install txtorcon"): https://pypi.python.org/pypi/txtorcon/0.14.0 https://github.com/meejah/txtorcon/releases/tag/v0.14.0 Releases are also available from the hidden service: http://timaq4ygg2iegci7.onion/txtorcon-0.14.0.tar.gz http://timaq4ygg2iegci7.onion/txtorcon-0.14.0.tar.gz.asc http://timaq4ygg2iegci7.onion/txtorcon-0.14.0-py2-none-any.whl http://timaq4ygg2iegci7.onion/txtorcon-0.14.0-py2-none-any.whl.asc You can verify the sha256sum of both by running the following 4 lines in a shell wherever you have the files downloaded: cat <<EOF | sha256sum --check d44be978dd9521f22333edea49789fe7e19c4bea9a02d63e6ec826d08fb571d1 dist/txtorcon-0.14.0-py2-none-any.whl a2d0fae65da015840bb392ffc4fd63918168edb6b634941f6b8aa843b338edbf dist/txtorcon-0.14.0.tar.gz EOF thanks, meejah -----BEGIN PGP SIGNATURE----- iQEcBAEBAgAGBQJWB3FqAAoJEMJgKAMSgGmn1fsIAOcqJdIFIO5cIvj7TRv9rxFP aEW/Vb0UGN2/A2skWxajzYJLeZ0geZaOmYGFyscs+fguIzRaGRX4G8+7xRuIzzim GqjPd+my2+h79hDpH/RntwQvNmW9K50UCKOH5TiW5fGnjbO0ZkqoK8ln+aGz7WCC HjRmNTJb7iI/vVs6h66evBHjkgib4lBpZSEnJR6H+c1ZC5hJ0uYt1SFch5IF/UsA QeISGyc2SE5wuwjLuiHDyPoFC/Q5IEhDWABFgs3Z6LB/2yfnUFEjrtEla1Uq454H YJ5mWsgxx/apGDXbACQ3p8N5ISA/WDOWHm2Rg+XgZo758KyMyOQIfSy1RrGBld4= =Wffv -----END PGP SIGNATURE-----

1 1

[PATCH] Document our current guard selection algorithm in path-spec.txt.
by isis 23 Oct '15

23 Oct '15

Hey hey, I've been working on documenting our current guard selection algorithm (#17261), [0] which as most of you already know, has some room for improvement. The patch is in my bug17261 branch. [1] However, it's also attached here for reference and discussion. [0]: https://trac.torproject.org/projects/tor/ticket/17261 [1]: https://gitweb.torproject.org/user/isis/torspec.git/log/?h=bug17261 Best, -- ♥Ⓐ isis agora lovecruft _________________________________________________________ OpenPGP: 4096R/0A6A58A14B5946ABDE18E207A3ADB67A2CDB8B35 Current Keys: https://blog.patternsinthevoid.net/isis.txt

1 0

Re: [tor-dev] Load Balancing in 2.7 series - incompatible with OnionBalance ?
by Alec Muffett 23 Oct '15

23 Oct '15

info(a)tvdw.eu wrote: > Hi Alec, Hi Tom! I love your proposal, BTW. :-) > Most of what you said sounds right, and I agree that caching needs TTLs (not just here, all caches need to have them, always). Thank you! > However, you mention that one DC going down could cause a bad experience for users. In most HA/DR setups I've seen there should be enough capacity if something fails, is that not the case for you? Can a single data center not serve all Tor traffic? It's not the datacentre which worries me - we already know how to deal with those - it's the failure-based resource contention for the limited introduction-point space that is afforded by a maximum (?) of six descriptors each of which cites 10 introduction points. A cap of 60 IPs is a clear protocol bottleneck which - even with your excellent idea - could break a service deployment. Yes, in the meantime the proper solution is to split the service three ways, or even four, but that's administrative burden which less well-resourced organisations might struggle with. Many (most?) will have a primary site and a single failover site, and it seems perverse that they could bounce just ONE of those sites and automatically lose 50% of their Onion capacity for up to 24 hours UNLESS they also take down the OTHER site for long enough to invalidate the OnionBalance descriptors. Such is not the description of a high-availability (HA) service, and it might put people off. > If that is a problem, I would suggest adding more data centers to the pool. That way if one fails, you don't lose half of the capacity, but a third (if N=3) or even a tenth (if N=10). ...but you lose it for 1..24 hours, even if you simply reboot the Tor daemon. > Anyway, such a thing is probably off-topic. To get back to the point about TTLs, I just want to note that retrying failed nodes until all fail is scary: I find that worrying, also. I'm not sure what I think about it yet, though. > what will happen if all ten nodes get a 'rolling restart' throughout the day? Wouldn't you eventually end up with all the traffic on a single node, as it's the only one that hadn't been restarted yet? Precisely. > As far as I can see the only thing that can avoid holes like that is a TTL, either hard coded to something like an hour, or just specified in the descriptor. Then, if you do a rolling restart, make sure you don't do it all within one TTL length, but at least two or three depending on capacity. Concur. desnacked(a)riseup.net wrote: > Please see rend_client_get_random_intro_impl(). Clients will pick a random intro point from the descriptor which seems to be the proper behavior here. That looks great! > I can see how a TTL might be useful in high availability scenarios like the one you described. However, it does seem like something with potential security implications (like, set TTL to 1 second for all your descriptors, and now you have your clients keep on making directory circuits to fetch your descs). Okay, so, how about: IDEA: if ANY descriptor introduction point connection fails AND the descriptor's ttl has been exceeded THEN refetch the descriptor before trying again? It strikes me (though I may be wrong?) that the degenerate case for this would be someone with an onion killing their IP in order to force the user to refetch a descriptor - which is what I think would happen anyway? At very least this proposal would add a work factor. > For this reason I'd be interested to see this specified in a formal Tor proposal (or even as a patch to prop224). It shouldn't be too big! :) I would hesitate to add it to Prop 224 which strikes me as rather large and distant. I'd love to see this by Christmas :-P teor2345(a)gmail.com wrote: > Do we connect to introduction points in the order they are listed in the descriptor? If so, that's not ideal, there are surely benefits to a random choice (such as load balancing). Apparently not (re: George) :-) > That said, we believe that rendezvous points are the bottleneck in the rendezvous protocol, not introduction points. Currently, and in most current deployments, yes. > However, if you were to use proposal #255 to split the introduction and rendezvous to separate tor instances, you would then be limited to: > - 6*10*N tor introduction points, where there are 6 HSDirs, each receiving 10 different introduction points from different tor instances, and N failover instances of this infrastructure competing to post descriptors. (Where N = 1, 2, 3.) > - a virtually unlimited number of tor servers doing the rendezvous and exchanging data (say 1 server per M clients, where M is perhaps 100 or so, but ideally dynamically determined based on load/response time). > In this scenario, you could potentially overload the introduction points. Exactly my concern, especially when combined with overlong lifetimes of mostly-zombie descriptors. - alec

3 4

Hello, after about one hour since my Tor browser successfully connected to Tor network with a obfs4 bridge, I couldn't open any webpage through Tor proxy.
by Li Xiaodong 22 Oct '15

22 Oct '15

*Hello, after about one hour since my Tor browser successfully connected to Tor network with a obfs4 bridge, I couldn't open any webpage through Tor proxy. But after I restarted my Tor browser, Tor browser can work normally again. Does China's firewall disturb obfs4 bridges? Thank you very much for your help. I really appreciate it.*

2 1

Faster Bootstrap - Prop #210 (Revised)
by Tim Wilson-Brown - teor 22 Oct '15

22 Oct '15

Hi All, I have revised proposal #210 - Faster Headless Consensus Bootstrapping today, after a number of discussions with Peter Palfrader Nick Mathewson, Mike Perry, and others. This proposal aims to improve tor’s consensus download behaviour when the authorities (or directory mirrors) are down. It has tor initiate multiple concurrent consensus connections, then download the consensus through the first TLS connection that completes. This proposal is a solution to bug #4483 - If k of n authorities are down, k/n bootstrapping clients are delayed for minutes. It is also needed to implement #15775 - Add IPv4 Fallback Directory List to tor... and #8374 - Ship list of fallback directory mirrors on long-term fixed IPv6 addresses. The key changes are: * modify the scheme to perform exponential backoff on connections, rather than connections in batches * modify the scheme to enable IPv6 bootstrap on IPv6-only clients (see also #17217 - Change clients to automatically use IPv6 if they can bootstrap over it) * specify a way that clients can still benefit from clock verification via TLS connections with the authorities, without downloading the entire consensus from the authorities if it is available sooner from a mirror * analyse the expected failure rate and additional connection load imposed by this proposal The full text is included below, and a branch with these changes is available as bootstrap-exponential-backoff in https://github.com/teor2345/torspec.git <https://github.com/teor2345/torspec.git> Please feel free to respond here, or on the #4483 ticket at https://trac.torproject.org/projects/tor/ticket/4483 <https://trac.torproject.org/projects/tor/ticket/4483> Thanks Tim ----- Filename: 210-faster-headless-consensus-bootstrap.txt Title: Faster Headless Consensus Bootstrapping Author: Mike Perry, Tim Wilson-Brown, Peter Palfrader Created: 01-10-2012 Last Modified: 02-10-2015 Status: Open Target: 0.2.8.x+ Overview and Motiviation This proposal describes a way for clients to fetch the initial consensus more quickly in situations where some or all of the directory authorities are unreachable. This proposal is meant to describe a solution for bug #4483. Design: Bootstrap Process Changes The core idea is to attempt to establish bootstrap connections in parallel during the bootstrap process, and download the consensus from the first connection that completes. Connection attempts will be performed on an exponential backoff basis. Initially, connections will be performed to a randomly chosen hard coded directory mirror and a randomly chosen canonical directory authority. If neither of these connections complete, additional mirror and authority connections are tried. Mirror connections are tried at a faster rate than authority connections. We specify that mirror connections retry after half a second, and then double the retry time with every connection: 0, 1, 2, 4, 8, 16, 32, ... We specify that directory authority connections retry after 5 seconds, and then double the retry time with every connection: 0, 10, 20, ... If the client has both an IPv4 and IPv6 address, we try IPv4 and IPv6 mirrors and authorities on the following schedule: IPv4, IPv6, IPv4, IPv6, ... We try IPv4 first to avoid overloading IPv6-enabled authorities and mirrors. Mirrors and auths get a separate IPv4/IPv6 schedule. This ensures that we try an IPv6 authority within the first 10 seconds. This helps implement #8374 and related tickets. The maximum retry time for both timers is 3 days + 1 hour. This places a small load on the mirrors and authorities, while allowing a client that regains a network connection to eventually download a consensus. The retry timers must reset on HUP and any network reachability events, [ TODO: do we have network reachability events? ] so that clients that have unreliable networks can recover from network failures. The first connection to complete will be used to download the consensus document and the others will be closed, after which bootstrapping will proceed as normal. A benefit of connecting to directory authorities is that clients are warned if their clock is wrong. Therefore, when closing a directory authority connection, we check to see if we have successfully connected to an authority during this run of the Tor client. If not, we allow the authority TLS connection to complete, then close the connection. We expect the vast majority of clients to succeed within 4 seconds, after making up to 4 connection attempts to mirrors and 1 connection attempt to an authority. Clients which can't connect in the first 10 seconds, will try 1 more mirror, then try to contact another directory authority. We expect almost all clients to succeed within 10 seconds. This is a much better success rate than the current Tor implementation, which fails k/n of clients if k of the n directory authorities are down. (Or, if the connection fails in certain ways, (k/n)^2.) If at any time, the total outstanding bootstrap connection attempts exceeds 10, no new connection attempts are to be launched until an existing connection attempt experiences full timeout. The retry time is not doubled when a connection is skipped. Design: Fallback Dir Mirror Selection The set of hard coded directory mirrors from #572 shall be chosen using the 100 Guard nodes with the longest uptime. The fallback weights will be set using each mirror's fraction of consensus bandwidth out of the total of all 100 mirrors, adjusted to ensure no fallback directory sees more than 10% of clients. We will also exclude fallback directories that are less than 1/1000 of the consensus weight, as they are not large enough to make it worthwhile including them. This list of fallback dir mirrors should be updated with every major Tor release. In future releases, the number of dir mirrors should be set at 20% of the current Guard nodes (approximately 200 as of October 2015), rather than fixed at 100. Performance: Additional Load with Current Parameter Choices This design and the connection count parameters were chosen such that no additional bandwidth load would be placed on the directory authorities. In fact, the directory authorities should experience less load, because they will not need to serve the consensus document for a connection in the event that one of the directory mirrors complete their connection before the directory authority does. However, the scheme does place additional TLS connection load on the fallback dir mirrors. Because bootstrapping is rare and all but one of the TLS connections will be very short-lived and unused, this should not be a substantial issue. The dangerous case is in the event of a prolonged consensus failure that induces all clients to enter into the bootstrap process. In this case, the number of TLS connections to the fallback dir mirrors within the first second would be 2*C/100, or 40,000 for C=2,000,000 users. If no connections complete before the 10 retries, 7 of which go to mirrors, this could reach as high as 140,000 connection attempts, but this is extremely unlikely to happen in full aggregate. However, in the no-consensus scenario today, the directory authorities would already experience 2*C/9 or 444,444 connection attempts. (Tor currently tries 2 authorities, before delaying the next attempt.) The 10-retry scheme, 3 of which go to authorities, increases their total maximum load to about 666,666 connection attempts, but again this is unlikely to be reached in aggregate. Additionally, with this scheme, even if the dirauths are taken down by this load, the dir mirrors should be able to survive it. Implementation Notes: Code Modifications The implementation of the bootstrap process is unfortunately mixed in with many types of directory activity. The process starts in update_consensus_networkstatus_downloads(), which initiates a single directory connection through directory_get_from_dirserver(). Depending on bootstrap state, a single directory server is selected and a connection is eventually made through directory_initiate_command_rend(). There appear to be a few options for altering this code to retry multiple simultaneous connections. Without refactoring, one approach would be to set a connection retry helper function timer in directory_initiate_command_routerstatus() from directory_get_from_dirserver() if the purpose is DIR_PURPOSE_FETCH_CONSENSUS and the only directory servers available are the authorities and the fallback dir mirrors. (That is, there is no valid consensus.) The retry helper function would check the list of pending connections and, if it is 10 or greater, skip the connection attempt, and leave the retry time constant. The code in directory_initiate_command_rend() would then need to be altered to maintain a list of the dircons created for this purpose as well as avoid immediately queuing the directory_send_command() request for the DIR_PURPOSE_FETCH_CONSENSUS purpose. A flag would need to be set on the dircon to be checked in connection_dir_finished_connecting(). The function connection_dir_finished_connecting() would need to be altered to examine the list of pending dircons, determine if this one is the first to complete, and if so, then call directory_send_command() to download the consensus and close the other pending dircons. connection_dir_finished_connecting() would also cancel the timer. Reliability Analysis We make the pessimistic assumptions that 50% of connections to directory mirrors fail, and that 20% of connections to authorities fail. (Actual figures depend on relay churn, age of the fallback list, and authority uptime.) We expect the first 10 connection retry times to be: Mirror: 0s 1s 2s 4s 8s 16s 32s Auth: 0s 10s 20s Success: 90% 95% 97% 98.7% 99.4% 99.89% 99.94% 99.988% 99.994% 97% of clients succeed in the first 2 seconds. 99.4% of clients succeed without trying a second authority. 99.89% of clients succeed in the first 10 seconds. 0.11% of clients remain, but in this scenario, 2 authorities are down, so the client is most likely blocked from the Tor network. The current implementation makes 1 or 2 authority connections within the first second, depending on exactly how the first connection fails. Under the 20% authority failure assumption, these clients would have a success rate of either 80% or 96% within a few seconds. The scheme above has a greater success rate in the first few seconds, while spreading the load among a larger number of directory mirrors. In addition, if all the authorities are blocked, current clients will inevitably fail, as they do not have a list of directory mirrors.

1 1

Status of Open Hidden Service Proposals (October 2015)
by George Kadianakis 22 Oct '15

22 Oct '15

Greetings, it's well known that hidden services need some love: https://blog.torproject.org/blog/hidden-services-need-some-love For the past 2 years we've been busy designing the upcoming hidden service protocol with improved cryptography, security, and performance. During this time we've written a good amount of improvement proposals and specifications, that have now been floating around our git repositories. In this mail I aim to collect and briefly explain all these proposals in one place so that researchers and developers have easier access to them. Ideally we would also make a wiki page tracking them. Similar efforts have been done for the set of all Tor proposals by Nick: https://blog.torproject.org/blog/tor-design-proposals-how-we-make-changes-o… https://gitweb.torproject.org/torspec.git/tree/proposals/proposal-status.txt This might also make for an informative blog post if I clean it up a bit. Please let me know if I should try to get it posted on the blog so that it reaches a greater audience. Let's start walking over each proposal in a hopefully reasonable order: ======================================================================== == Proposal 250: Random Number Generation During Tor Voting == [Prerequisite proposal] URL: https://gitweb.torproject.org/torspec.git/tree/proposals/250-commit-reveal-… Status: [Under development - https://trac.torproject.org/projects/tor/ticket/16943] This is a prerequisite for the proposals that follow. It specifies how the Tor directory authorities can produce a fresh and unpredictable random value every day. We plan to use this value to randomize the responsible HSDirs of hidden services and make them unpredictable. This will help defend against attacks that require the attacker to become the HSDir of a hidden service. == Proposal 224: Next-Generation Hidden Services in Tor == [Main proposal!] URL: https://gitweb.torproject.org/torspec.git/tree/proposals/224-rend-spec-ng.t… Status: [Under development - https://trac.torproject.org/projects/tor/ticket/12424] This is the master proposal of the "Next Generation Hidden Services" project. It outlines a more or less completely revised version of the Tor hidden services protocol, improved to accomodate better cryptography and defenses for several attacks we'd never considered when we did the original design! The following proposals plug into the protocol specified by this proposal. == Proposal 246: Merging Hidden Service Directories and Introduction Points == [Performance improvement] URL: https://gitweb.torproject.org/torspec.git/tree/proposals/246-merge-hsdir-an… Status: [Research Phase - https://lists.torproject.org/pipermail/tor-dev/2015-July/009079.html] This document describes a modification to proposal 224, which simplifies and improves the architecture by combining hidden service directories and introduction points at the same relays. It will speed up the initial connection to hidden services considerably since only two circuit establishments will be needed instead of three. == Proposal 247: Defending Against Guard Discovery Attacks using Vanguards == [Security improvement] URL: https://gitweb.torproject.org/torspec.git/tree/proposals/247-hs-guard-disco… Status: [Research Phase - https://lists.torproject.org/pipermail/tor-dev/2015-July/009066.html] This document describes a modification to the path selection for hidden service circuits. It aims to defend against attacks where clients try to discover the hidden service's guard relay(s). This proposal also depends on having better and more robust algorithms for guard node selection. This requires another mini-proposal: https://lists.torproject.org/pipermail/tor-dev/2015-August/009297.html == Proposal 255: Controller features to allow for load-balancing hidden services == [Scalability improvement] URL: https://gitweb.torproject.org/torspec.git/tree/proposals/255-hs-load-balanc… Discussion thread: https://lists.torproject.org/pipermail/tor-dev/2015-September/009597.html Status: [Under Development - https://trac.torproject.org/projects/tor/ticket/17254] We have plans for bringing hidden services to the next level. We are talking hidden services with 100x the clients they can currently handle, and with mechanisms that allow operators to load balance and achieve high availability. This proposal defines a way for hidden services to _load balance_ their clients by allowing *multiple hosts* to do the actual rendezvous with the clients. This is something that busy hidden service operators need currently. On the scaling front, we also worked on onionbalance which allows operators to have _high availability_ by allowing multiple hosts that handle introductions. Onionbalance is already usable by operators, and we have various improvements that we want to do in the future: https://github.com/DonnchaC/onionbalance == Proposal 252: Single Onion Services == [Optional Performance improvement] URL: https://gitweb.torproject.org/torspec.git/tree/proposals/252-single-onion.t… Status: [Research Phase - https://lists.torproject.org/pipermail/tor-dev/2015-September/009408.html] Websites like blockchain.info and Facebook are starting to offer hidden services to their clients. They do so to protect their clients from the fundamental exit-node attacks and also to provide them with Tor-specific features. Using hidden services in this context is also good news for the whole Tor, since hidden service circuits don't require exit relays who are the current bottleneck of the network. However, services like blockchain.info don't care about their anonymity; they only care about the anonymity of their clients. For services with this threat model, there are protocol modification that we can do to provide greater performance and load balancing options, since they don't need the 3-hop anonymizing circuits of Tor. Proposal 252 specifies how we can modify the Tor protocol to better accomodate services with this use case. Of course this would be an *opt-in setting* only for the services that want it. == Proposal: Direct Onion Services == [Optional Performance Improvement] URL: https://lists.torproject.org/pipermail/tor-dev/2015-April/008625.html Status: [Under Development - https://trac.torproject.org/projects/tor/ticket/17178] Proposal 252 "Single Onion Services" requires some protocol modifications that render it backwards _incompatible_. This means that Tor clients need to be updated to use these "single onion services". In the meanwhile services with the blockchain.info threat model that want to enjoy greater performance even with the current protocol can simply use 1-hop circuits for their server-side circuits. This should grant better performance with no cost to client anonymity while remaining backwards compatible. The "Direct Onion Services" proposal specifies how this should be done. I hear a newer version of the proposal will soon come out! ========================================================================

1 0

Load Balancing in 2.7 series - incompatible with OnionBalance ?
by Alec Muffett 22 Oct '15

22 Oct '15

So I’ve just had a conversation with dgoulet on IRC, which I will reformat and subedit here as a conversation regarding OnionBalance and issues in 2.6 and 2.7 when a recently rebooted HS publishes a fresh descriptor: […] alecm: consider OnionBalance which - being a bunch of daemons on a bunch of servers - will be a lot more prone to intermittent failures of 1+ daemons yielding a lot of republishing alecm: we tend to move services around, and daemons will be killed in one place and resurrected elsewhere, and then we'll have to bundle up a new descriptor and ship it out dgoulet: hrm so with that new 027 cache behavior, as long as the IP are usable, the descriptor will be kept, if they all become unusable, a new descriptor fetch is triggered and then those IPs will be tried alecm: There's a mandatory refresh [of the descriptor] after N minutes? dgoulet: we'll retry 3 times and after that all HSDir are in timeout for 15 minutes (I think, I'll have to validate) before retrying any HSDirs alecm: I wonder if descriptors should publish a recommended TTL - [number of seconds to live before refresh] dgoulet: yeah we have an idea for a "revision-counter" in the descriptor being incremented at each new version for the 24 hours period dgoulet: a TTL could be useful for load balancing though! alecm: so, here's a scenario: imagine that we run 10 daemons, alecm: call these daemons: A B C D E F G H I J - they all have random onion addresses alecm: we steal one IP from each daemon, and bundle the 10 stolen IPs together to make an onionbalance site descriptor and publish it alecm: people pull that descriptor, it's quite popular alecm: we then lose power in a datacentre, which takes out half of our onions - say, A through E alecm: we reboot the datacentre and restart A-E merely 10 minutes later alecm: everyone who has already loaded our onionbalance site descriptor tests A B C D E and finds them all dead, because the old IPs for A-E are invalid alecm: so they all move to F G H I J - which get overloaded even though (new) A B C D E are back up alecm: and this persists for up to 244, even though the outage was only 10 minutes alecm: net result: large chunks of the world (anyone with an old descriptor + anyone randomly choosing F-J) have a shitty experience, which is not what high-availability is all about :-) dgoulet: that will be what's going to happen - having a TTL in the desc. would help here indeed, I see the issue dgoulet: TTL would be one thing to add, here we could also add a mechanism for a client retrying IPs that failed in the situation where some of the IPs are still working, or making client balance themself randomly could be also an idea dgoulet: definitely there is some content here for tor-dev - I don't have a good answer but it should definitely be addressed alecm: proper random selection of IP would be beneficial for load-balancing; not perfect, but in the long run, helpful — Alec Muffett Security Infrastructure Facebook Engineering London

4 4

← Newer
1
2
3
4
5
6
7
8
9
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

tor-dev October 2015