-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
With isis' and sysrqb's permission, moving the new BridgeDB Distributor (and maybe general bridgedb distributor architecture discussion) thread onto tor-dev@.
On 04/15/2014 10:30 PM, Kostas Jakeliunas wrote:
On 03/29/2014 10:08 AM, Matthew Finkel wrote:
(I look the liberty of making this readable again :))
On Fri, Mar 28, 2014 at 08:00:17PM +0200, Kostas Jakeliunas wrote:
isis wrote:
Kostas Jakeliunas transcribed 7.9K bytes:
Hey isis,
wfn here. [...]
Hi!
Howdy!
I'm super excited to hear you're interested in working on this! [...]
[...] a couple of questions (more like inconcrete musings) [...]:
Would you personally think that incorporating "some" ideas from #7520[1] ("Design and implement a social distributor for BridgeDB") would be within the scope of a ~three+ month project? The way I see it, if a twitter (or, say, xmpp+otr as mentioned by you/others on IRC) distributor were to be planned, it would either need to
- incorporate some form of churn rate control / Sybil
attack prevention, via e.g. recaptcha (I see that twitter direct (=personal) messages can include images; they'll probably be served by one of twitter media CDNs (would need to look things up), but it's probably safe to assume that as long as twitter itself is not blocked, those CDNs won't be, either);
Yes, this stuff is already built, and wouldn't be too hard to incorporate. However, as I'm sure you already understand, there is no Proof of Work system which actually works for users while keeping adversaries out.
For sure, we always have to keep this in mind. Hopefully there's a compromise that kinda-works, and eventually, given some more metrics/diagnostic info intersected with OONI hopefully being able to say which bridges don't work from which countries, it'll be possible to actually carry out tests in a kind-of-scientific/not-blind-guessing way..
At this point I just assume our adversary will always have more resources than us no matter which mechanism we use. More people, more compute power/time, more money. At this point I think we only have two things that they don't. We have more bridges and more love for people. Leveraging this is...not easy, however. :( POW is useful in some cases, for example, to prevent an asshole from crawling bridgedb so that they can add all bridges to a blacklist. When dealing with state-level adversaries I agree with isis that they're of little use.
Agree.
- or take an idea from the social distributor in #7520,
namely/probably, implement some form of token system.
This is not very doable in 6 weeks. It also, sadly, requires the DB backend work (which I'll be doing over the next three months, but might take more time).
Aha, understood, yes. So basically, ideally I'd write code that could *later on* be easily extendable in relevant ways. But no tokens for now.
Ideally this sounds like a good idea, however I'm not sure we (or at least I) have a good handle on what bridgedb will look like in 6-12 months. It's undergoing a lot of change right now. Don't interpret this as saying this is a bad idea because the more abstract and extensible you make this distributor the more useful it will be. I'm just a little worried about writing something for the future. Perhaps there's a good way to design and plan for this, though.
Yeah, understood. As I understand it, isis is changing some things in bridgedb (bridgedb.Distributor, etc) right now / these days.
For now, the idea is to have a thing that works that is more or less completely decoupled from the bridgedb codebase. If we do this right, it will hopefully be relatively easy to then integrate it in a way that will make sense at that point in time (e.g. as part of bridgedb.Distributor, *or* as a client to a core RESTful distributor/api/service that gives bridges to other 'third-party' distributors (see below.))
It might be possible to have some simplistic token system with pre-chosen seed nodes, etc. Of course, security and privacy implications ahoy - first and foremost, this would result in more than zero places/people knowing t he entire social graph, unless your and other people's ideas (the whole Pandora box of; I should attempt an honest read of rBridge, et al.; have only skimmed as of now) re: oblivious transfer, etc. were incorporated. Here it becomes quite difficult to define short-ish term deliverables of course. I know that you did quite a lot of research on the private/secure social distributor idea.
Really, you don't want to get into this stuff. Or do, but don't do it for GSoC. I've spent the past year painfully writing proofs to correct the erro rs in that paper, and discovered some major problems for anonymity in old "tried-and-true" cryptographic primitives in the process.
This is a HUGE project.
Sounds insanely intense, in both a good and a bad way! It's definitely interesting, but this is a whole other level for sure. Ok. (Btw, interesting re: proofs for crypto/whatever concepts. Intense stuff.)
rBridge is definitely a large project. This is not to say your help would not be appreciated [...] so I'm sure you can help, however this is much larger than GSoC.
For sure, OK! (and yes, it sounds like an interesting avenue / way of getting into zero knowledge madness etc etc. Will see how it goes..)
But I wonder if it wouldn't make sense to attempt a simplistic token system, as well as (possibly) some hopefully-not-too-evil incarnation of recaptcha, to
- have a system that would actually do its job; - have a
system that would be easily extendible into cryptomadness later on / when the time comes; - have a framework for more complex social distributors later on. The latter is maybe because only doing a naive twitter distributor (via PMs) project is not enough, in the sense of gsoc scope (or I'd like to attempt something a bit more ambitious anyway. But here of course dragon territory starts, with many slippery slopes.) This would hopefully be very useful f or later work, with hopefully quite a bit of code reuse being possible (vs. twitter distributor that would use your twisted stuff + additional code which would likely not be beneficial for other distributors/projects.) Architecturally, this token-based distributor would be a generic/parent class, with the particular twitter distributor inheriting from it. Hopefully this would all result in some clean, reusable code.
Ideally, the token-distributor would work in a (generic) way that could be used in IRC, etc etc. I think a simplistic version of it would still prove useful (assuming we'd actually be OK with taking the responsibility of maintaining and knowing the social graph, and so on.) This would be very nice indeed.
I know that you want this, and I know that you want it for good reasons. Bu t I refuse to have access to something which might potentially get me shot in s ome countries.
Yeah, I see the problem here. Ok.
+1
[...]
Sorry for the kinda-delayed reply!
Cool.
So right now I'm thinking just doing a simplistic twitter-direct-message-based bot. I do believe some churn control / semi-working-PoW thing is needed. Reusing (large) parts of bridgedb's recaptcha makes sense to me.
I don't think this is a crazy idea.
I wrote a simplistic twitter bot that pretends to give bridges to people who ask:
https://github.com/wfn/twidibot
(tweepy (a python module) is used to interact with the twitter API. Twitter has two main APIs: 'streaming' and the 'RESTful' API. The streaming API just gives you event data about things that you're interested in (so e.g. 'someone started following me', 'someone sent me a message.') The restful twitter api is for actually doing things (like sending messages) and getting event info on a per-query basis (we don't need the latter.))
As of now, the bot is running under this account:
https://twitter.com/wfntestacct
Try following it; it should send you a direct message, to which you can reply with e.g.
"get bridges" or "get me some bridges nao!" or "get obfs3 scramblesuit fte bridges"
The bridge data returned is of course beyond-bogus (code is really just a placeholder), but what is I think good is that I wrote the thing in a way that can be easily extended - the bridge-getting-process is abstracted away.
The main bot stuff is at https://github.com/wfn/twidibot/blob/master/twidibot/twitter_bot.py
The stub 'bridge-getter' is at https://github.com/wfn/twidibot/blob/master/twidibot/bridge_getter.py
(take a look at the comments at the top maybe[3])
Basically I wanted to see if there'd be problems with the twitter api, and so on. I should try to somehow benchmark the thing, to see if anything breaks (on e.g. twitter's end[4]) when there are many requests, etc. (the code itself is kind-of-not-ready for that, but it should be easy to fix this up; but I'm also worried about twitter limits and that sort of thing - stuff that in the end may be hard to control on our end; hence the PoC, etc.)
I also wanted to have some skeleton code that will support future abstraction/extension (i.e. hopefully it's not an ugly script.)
[3]: the comments (pretty much all there is there) make it sound as if going the way they say we should go is the right way. The comments should probably be interpreted as a flaky proposal/ideas at best!
[4]: By "problems on twitter's end" I of course meant twitter possibly rate-limiting the bot, or even blacklisting it somehow due to high direct message load (maybe lots of direct messages => anomalous behaviour, but then again, probably not.)
It also seems like we're combining the email and http distributor with this, not that it is bad.
..so the whole plan in terms of bridgedb abstraction / bridge-getting-mechanism abstraction is not clear, I suppose.
Here's what asn wrote on that google melange project page[2] (as a comment to the proposal):
asn April 4, 2014, 11:09 p.m.:
Don't mind me too much, but it would be great if we could have a simple RESTful HTTPS distributor for BridgeDB, before writing exotic distributors like Twitter. Such a distributor would expose a simple REST API that clients (like tor-launcher etc.) could use to fetch CAPTCHAs/bridges.
A RESTful distributor seems easier and it would also give us a fair idea of the different methods that a BridgeDB distributor needs to implement. It would serve as a basic distributor that can be used as a skeleton to build more complicated ones.
just 2 cents
I then spoke with him briefly on irc. Basically, I'm up for writing a RESTful bridge api/distributor, too (or focusing on it instead); but maybe it makes sense to continue with the twitter bot (and have the bridge-getting-mechanism be abstract enough to be completely turned over / extended / changed easily, etc.), and to continue thinking about the RESTful bridge api. Focusing on a twitter bot for now will probably lead to clearer deliverables and results. But we can develop things while thinking about related things in parallel.
I'm sure isis has some ideas here, too (e.g. how feasible it is to implement a restful bridge distributor soon.) But I agree that it would be nice to have a core distributor that could be used by tor-launcher, and by other distributors (e.g. this twitter bot, as well as future distributors.)
I do wonder if there is a better rate-limiting mechanism than a captcha that we can use. I suspect there is something, but it won't be much better, in reality. But, also, I wonder if we want to use a POW at all. Maybe only rate-limit a handle by time period, similar to how we handle emails? I don't know what is best. I think this can be decided in the coming months, though.
Yeah, for now I'll probably flesh out a rate-limiting-by-time-period thingie - simplistic, but this is OK. There are some nuances, e.g. this would require the twitter bot to remember every use who was given bridges, at least for a short while; even if this would not be persisted / would only be in memory, it's a delicate thing I suppose, security and privacy wise. etc.
But yeah, I'm not sure if a captcha PoW scheme is the way to go, either. We can continue thinking about it, and meanwhile have something simplistic.
Or would any of you prefer a different kind of distributor? IRC was mentioned (churn control more difficult), as well as XMPP+OTR (the latter would be more difficult to do for sure, lots of things to integrate.) Whatsapp, too. Twitter, of course, sounds the most easy/easily-deliverable. I can try to come up with future goals/tasks would it turn out to be too easy (ha! i'm sure i'm bound to stumble into unforeseen problems..)
I think you should have enough time to complete the twitter distibutor and, at least, start working on another one. Twitter seems like a good place to start. Some people would like an XMPP+OTR distibutor, but that will be a large and time consuming project. Maybe a chat protocol that is popular in AsiaPAC is a good second choice.
Yeah! Sounds good. Federated-chat-systems (like XMPP)-based distributor would sure be nice. WhatsApp distributor would surely be very useful, and would be pretty hard to censor, if only in the legal-repercussion-sense (lots of collateral damage so to speak.) Let's think about this.
Basically, if there are other ideas afloat around bridgedb that are doable / can be incorporated into this, let me know (maybe you've been wishing to do something not too difficult but simply do not have time?)
There are a few outstanding tickets that we really should fix/implement/do, so maybe look through them and see if any seem interesting to you?
As far as the twitter bot idea is concerned, it's pretty much straightforward, I guess. I can't think of a proper way to subdivide the twitter distributor into further hashrings; i.e. there'd probably need to be one single hashring for twitter, and that's it. Of course the twitter-handlespace is larger than, say, IPv4-space. I assume this is not a problem in and of itself (there's a hashring for IPv6 (though not yet quite functional), as I understand, etc.) But maybe there are some nasty nuances to be found.
We could split the hashring into n partitions and then choose a partition based on some property of the handle. I don't know if there is an advantage to this, though. Probably not considering it would be trivial and cheap to create a new handle that is mapped to another partition. Let us know if you think of something :)
Haven't yet. :) for now, I think having a normal hashring, and having a mechanism of giving only very few bridges (that remain the same, unless new brides inserted in the neighbourhood) makes sense.
Will do some thinking, finally have time now.
--
Kostas / wfn
0x0e5dce45 @ pgp.mit.edu
-- =E2=99=A5=E2=92=B6 isis agora lovecruft
As an aside, I'm not usually not this synical. It was a long day at work and it made me grumpy. But in any case, I'm really happy you decided to look at making one of these!
All the best, Matt
Good stuff. :) (fwiw, nothing of this sounded cynical, at all. Though everything depends on definitions, ha! https://en.wikipedia.org/wiki/Cynicism_(philosophy) )
The TL;DR would be
* working on a twitter-bridgedb-bot, PoC is at https://github.com/wfn/twidibot/
* feel free to try and break https://twitter.com/wfntestacct (flood it with messages! follow, un-follow, re-follow, try to crash it if you'd like!)
* just found out attaching images to direct messages over API directly is not possible: https://dev.twitter.com/discussions/24116 (whoops! And no updates to API doc elsewhere); still possible to link to images, etc. But kind of sad nevertheless.
* will implement a generic rate control mechanism. Will either use requests-per-time-period (this would require the distributor to remember some state about users for a while, which is a somewhat delicate thing), (and/)or text-based challenge-response (which can later on be replaced by something more sophisticated.)
* as of now, planning not to stop with a twitter distributor. Would sure be nice to also have an XMPP-based one, too.
* RESTful BridgeDB Distributor / bridge API discussion is welcome.
Did I miss something, or garble things up?