[tor-dev] New BridgeDB Distributor (was: Re: New BridgeDB Distributor (Twitter/SocialDistributor intersections, etc.))

22 Apr 2014

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
With isis' and sysrqb's permission, moving the new BridgeDB
Distributor (and maybe general bridgedb distributor architecture
discussion) thread onto tor-dev@.
On 04/15/2014 10:30 PM, Kostas Jakeliunas wrote:
...
On 03/29/2014 10:08 AM, Matthew Finkel wrote:
...
(I look the liberty of making this readable again :))
On Fri, Mar 28, 2014 at 08:00:17PM +0200, Kostas Jakeliunas
wrote:
...
isis wrote:
...
Kostas Jakeliunas transcribed 7.9K bytes:
...
Hey isis,
wfn here. [...]
Hi!
Howdy!
I'm super excited to hear you're interested in working on this! 
[...]
...
...
...
[...] a couple of questions (more like inconcrete musings)
[...]:
Would you personally think that incorporating "some" ideas
from #7520[1] ("Design and implement a social distributor
for BridgeDB") would be within the scope of a ~three+ month
project? The way I see it, if a twitter (or, say, xmpp+otr
as mentioned by you/others on IRC) distributor were to be 
planned, it would either need to

incorporate some form of churn rate control / Sybil

attack prevention, via e.g. recaptcha (I see that twitter
direct (=personal) messages can include images; they'll
probably be served by one of twitter media CDNs (would need
to look things up), but it's probably safe to assume that
as long as twitter itself is not blocked, those CDNs won't
be, either);
Yes, this stuff is already built, and wouldn't be too hard
to incorporate. However, as I'm sure you already understand,
there is no Proof of Work system which actually works for
users while keeping adversaries out.
For sure, we always have to keep this in mind. Hopefully
there's a compromise that kinda-works, and eventually, given
some more metrics/diagnostic info intersected with OONI
hopefully being able to say which bridges don't work from which
countries, it'll be possible to actually carry out tests in a
kind-of-scientific/not-blind-guessing way..
At this point I just assume our adversary will always have more 
resources than us no matter which mechanism we use. More people,
more compute power/time, more money. At this point I think we
only have two things that they don't. We have more bridges and
more love for people. Leveraging this is...not easy, however. :(
POW is useful in some cases, for example, to prevent an asshole
from crawling bridgedb so that they can add all bridges to a
blacklist. When dealing with state-level adversaries I agree with
isis that they're of little use.
Agree.
...
...
...
...

or take an idea from the social distributor in #7520,

namely/probably, implement some form of token system.
This is not very doable in 6 weeks. It also, sadly, requires
the DB backend work (which I'll be doing over the next three
months, but might take more time).
Aha, understood, yes. So basically, ideally I'd write code that
could *later on* be easily extendable in relevant ways. But no
tokens for now.
Ideally this sounds like a good idea, however I'm not sure we (or
at least I) have a good handle on what bridgedb will look like in
6-12 months. It's undergoing a lot of change right now. Don't
interpret this as saying this is a bad idea because the more
abstract and extensible you make this distributor the more useful
it will be. I'm just a little worried about writing something for
the future. Perhaps there's a good way to design and plan for
this, though.
Yeah, understood. As I understand it, isis is changing some things
in bridgedb (bridgedb.Distributor, etc) right now / these days.
For now, the idea is to have a thing that works that is more or
less completely decoupled from the bridgedb codebase. If we do this
right, it will hopefully be relatively easy to then integrate it in
a way that will make sense at that point in time (e.g. as part of 
bridgedb.Distributor, *or* as a client to a core RESTful 
distributor/api/service that gives bridges to other 'third-party' 
distributors (see below.))
...
...
...
...
It might be possible to have some simplistic token system
with pre-chosen seed nodes, etc. Of course, security and
privacy implications ahoy - first and foremost, this would
result in more than zero places/people knowing t he entire
social graph, unless your and other people's ideas (the
whole Pandora box of; I should attempt an honest read of
rBridge, et al.; have only skimmed as of now) re: oblivious
transfer, etc. were incorporated. Here it becomes quite
difficult to define short-ish term deliverables of course.
I know that you did quite a lot of research on the
private/secure social distributor idea.
Really, you don't want to get into this stuff. Or do, but
don't do it for GSoC. I've spent the past year painfully
writing proofs to correct the erro rs in that paper, and
discovered some major problems for anonymity in old 
"tried-and-true" cryptographic primitives in the process.
This is a HUGE project.
Sounds insanely intense, in both a good and a bad way! It's
definitely interesting, but this is a whole other level for
sure. Ok. (Btw, interesting re: proofs for crypto/whatever
concepts. Intense stuff.)
rBridge is definitely a large project. This is not to say your
help would not be appreciated [...] so I'm sure you can help,
however this is much larger than GSoC.
For sure, OK! (and yes, it sounds like an interesting avenue / way
of getting into zero knowledge madness etc etc. Will see how it
goes..)
...
...
...
...
But I wonder if it wouldn't make sense to attempt a
simplistic token system, as well as (possibly) some
hopefully-not-too-evil incarnation of recaptcha, to

have a system that would actually do its job; - have a

system that would be easily extendible into cryptomadness
later on / when the time comes; - have a framework for more
complex social distributors later on. The latter is maybe
because only doing a naive twitter distributor (via PMs) 
project is not enough, in the sense of gsoc scope (or I'd
like to attempt something a bit more ambitious anyway. But
here of course dragon territory starts, with many slippery
slopes.) This would hopefully be very useful f or later
work, with hopefully quite a bit of code reuse being
possible (vs. twitter distributor that would use your
twisted stuff + additional code which would likely not be
beneficial for other distributors/projects.) 
Architecturally, this token-based distributor would be a
generic/parent class, with the particular twitter
distributor inheriting from it. Hopefully this would all
result in some clean, reusable code.
Ideally, the token-distributor would work in a (generic)
way that could be used in IRC, etc etc. I think a
simplistic version of it would still prove useful (assuming
we'd actually be OK with taking the responsibility of 
maintaining and knowing the social graph, and so on.) This
would be very nice indeed.
I know that you want this, and I know that you want it for
good reasons. Bu t I refuse to have access to something which
might potentially get me shot in s ome countries.
Yeah, I see the problem here. Ok.
+1
[...]
...
Sorry for the kinda-delayed reply!
...
...
Cool.
So right now I'm thinking just doing a simplistic 
twitter-direct-message-based bot. I do believe some churn 
control / semi-working-PoW thing is needed. Reusing (large)
parts of bridgedb's recaptcha makes sense to me.
I don't think this is a crazy idea.
I wrote a simplistic twitter bot that pretends to give bridges to
people who ask:
https://github.com/wfn/twidibot
(tweepy (a python module) is used to interact with the twitter
API. Twitter has two main APIs: 'streaming' and the 'RESTful' API.
The streaming API just gives you event data about things that
you're interested in (so e.g. 'someone started following me',
'someone sent me a message.') The restful twitter api is for
actually doing things (like sending messages) and getting event
info on a per-query basis (we don't need the latter.))
As of now, the bot is running under this account:
https://twitter.com/wfntestacct
Try following it; it should send you a direct message, to which you
can reply with e.g.
"get bridges" or "get me some bridges nao!" or "get obfs3
scramblesuit fte bridges"
The bridge data returned is of course beyond-bogus (code is really
just a placeholder), but what is I think good is that I wrote the
thing in a way that can be easily extended - the
bridge-getting-process is abstracted away.
The main bot stuff is at 
https://github.com/wfn/twidibot/blob/master/twidibot/twitter_bot.py
The stub 'bridge-getter' is at 
https://github.com/wfn/twidibot/blob/master/twidibot/bridge_getter.py
(take a look at the comments at the top maybe[3])
...
Basically I wanted to see if there'd be problems with the twitter
api, and so on. I should try to somehow benchmark the thing, to see
if anything breaks (on e.g. twitter's end[4]) when there are many
requests, etc. (the code itself is kind-of-not-ready for that, but
it should be easy to fix this up; but I'm also worried about
twitter limits and that sort of thing - stuff that in the end may
be hard to control on our end; hence the PoC, etc.)
I also wanted to have some skeleton code that will support future 
abstraction/extension (i.e. hopefully it's not an ugly script.)
[3]: the comments (pretty much all there is there) make it sound as if
going the way they say we should go is the right way. The comments
should probably be interpreted as a flaky proposal/ideas at best!
[4]: By "problems on twitter's end" I of course meant twitter possibly
rate-limiting the bot, or even blacklisting it somehow due to high
direct message load (maybe lots of direct messages => anomalous
behaviour, but then again, probably not.)
...
...
It also seems like we're combining the email and http distributor
with this, not that it is bad.
..so the whole plan in terms of bridgedb abstraction / 
bridge-getting-mechanism abstraction is not clear, I suppose.
Here's what asn wrote on that google melange project page[2] (as a 
comment to the proposal):
asn April 4, 2014, 11:09 p.m.:
...
Don't mind me too much, but it would be great if we could have a 
simple RESTful HTTPS distributor for BridgeDB, before writing
exotic distributors like Twitter. Such a distributor would expose
a simple REST API that clients (like tor-launcher etc.) could use
to fetch CAPTCHAs/bridges.
A RESTful distributor seems easier and it would also give us a
fair idea of the different methods that a BridgeDB distributor
needs to implement. It would serve as a basic distributor that
can be used as a skeleton to build more complicated ones.
just 2 cents
I then spoke with him briefly on irc. Basically, I'm up for writing
a RESTful bridge api/distributor, too (or focusing on it instead);
but maybe it makes sense to continue with the twitter bot (and have
the bridge-getting-mechanism be abstract enough to be completely
turned over / extended / changed easily, etc.), and to continue
thinking about the RESTful bridge api. Focusing on a twitter bot
for now will probably lead to clearer deliverables and results. But
we can develop things while thinking about related things in
parallel.
I'm sure isis has some ideas here, too (e.g. how feasible it is to 
implement a restful bridge distributor soon.) But I agree that it
would be nice to have a core distributor that could be used by
tor-launcher, and by other distributors (e.g. this twitter bot, as
well as future distributors.)
...
I do wonder if there is a better rate-limiting mechanism than a
captcha that we can use. I suspect there is something, but it
won't be much better, in reality. But, also, I wonder if we want
to use a POW at all. Maybe only rate-limit a handle by time
period, similar to how we handle emails? I don't know what is
best. I think this can be decided in the coming months, though.
Yeah, for now I'll probably flesh out a
rate-limiting-by-time-period thingie - simplistic, but this is OK.
There are some nuances, e.g. this would require the twitter bot to
remember every use who was given bridges, at least for a short
while; even if this would not be persisted / would only be in
memory, it's a delicate thing I suppose, security and privacy wise.
etc.
But yeah, I'm not sure if a captcha PoW scheme is the way to go,
either. We can continue thinking about it, and meanwhile have
something simplistic.
...
...
Or would any of you prefer a different kind of distributor? IRC
was mentioned (churn control more difficult), as well as
XMPP+OTR (the latter would be more difficult to do for sure,
lots of things to integrate.) Whatsapp, too. Twitter, of
course, sounds the most easy/easily-deliverable. I can try to
come up with future goals/tasks would it turn out to be too
easy (ha! i'm sure i'm bound to stumble into unforeseen
problems..)
I think you should have enough time to complete the twitter
distibutor and, at least, start working on another one. Twitter
seems like a good place to start. Some people would like an
XMPP+OTR distibutor, but that will be a large and time consuming
project. Maybe a chat protocol that is popular in AsiaPAC is a
good second choice.
Yeah! Sounds good. Federated-chat-systems (like XMPP)-based
distributor would sure be nice. WhatsApp distributor would surely
be very useful, and would be pretty hard to censor, if only in the 
legal-repercussion-sense (lots of collateral damage so to speak.)
Let's think about this.
...
...
Basically, if there are other ideas afloat around bridgedb that
are doable / can be incorporated into this, let me know (maybe
you've been wishing to do something not too difficult but
simply do not have time?)
There are a few outstanding tickets that we really should 
fix/implement/do, so maybe look through them and see if any seem 
interesting to you?
...
As far as the twitter bot idea is concerned, it's pretty much 
straightforward, I guess. I can't think of a proper way to
subdivide the twitter distributor into further hashrings; i.e.
there'd probably need to be one single hashring for twitter,
and that's it. Of course the twitter-handlespace is larger
than, say, IPv4-space. I assume this is not a problem in and of
itself (there's a hashring for IPv6 (though not yet quite
functional), as I understand, etc.) But maybe there are some
nasty nuances to be found.
We could split the hashring into n partitions and then choose a 
partition based on some property of the handle. I don't know if
there is an advantage to this, though. Probably not considering
it would be trivial and cheap to create a new handle that is
mapped to another partition. Let us know if you think of
something :)
Haven't yet. :) for now, I think having a normal hashring, and
having a mechanism of giving only very few bridges (that remain the
same, unless new brides inserted in the neighbourhood) makes
sense.
...
...
...
...
Will do some thinking, finally have time now.
--
Kostas / wfn
0x0e5dce45 @ pgp.mit.edu
-- =E2=99=A5=E2=92=B6 isis agora lovecruft
As an aside, I'm not usually not this synical. It was a long day
at work and it made me grumpy. But in any case, I'm really happy
you decided to look at making one of these!
All the best, Matt
Good stuff. :) (fwiw, nothing of this sounded cynical, at all.
Though everything depends on definitions, ha! 
https://en.wikipedia.org/wiki/Cynicism_(philosophy) )
The TL;DR would be
* working on a twitter-bridgedb-bot, PoC is at
https://github.com/wfn/twidibot/
* feel free to try and break https://twitter.com/wfntestacct (flood
it with messages! follow, un-follow, re-follow, try to crash it if
you'd like!)
* just found out attaching images to direct messages over API
directly is not possible: https://dev.twitter.com/discussions/24116
(whoops! And no updates to API doc elsewhere); still possible to link
to images, etc. But kind of sad nevertheless.
* will implement a generic rate control mechanism. Will either use
requests-per-time-period (this would require the distributor to
remember some state about users for a while, which is a somewhat
delicate thing), (and/)or text-based challenge-response (which can
later on be replaced by something more sophisticated.)
* as of now, planning not to stop with a twitter distributor. Would
sure be nice to also have an XMPP-based one, too.
* RESTful BridgeDB Distributor / bridge API discussion is welcome.
Did I miss something, or garble things up?
...PGP SIGNATURE...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJTVncOAAoJEMCyQlAOXc5FnZwP/2YGPQnd9sXyET+O8sSbEMvA
g+6heUOsDK6k5tEEn7g700c2KVEaVREXLOlP68d27h/A0eM7b4nGjSJIhk1pcOIn
30DgaePIkaCPng5k5By1Z4ao79MwF7lEq/OHvN8+3W+Oje3HJzwogJOoYW7tH5Yi
6eqPPrOxNXcJ5zsRbUlzzl0uVFbuPf2JdioIkj3X5coOc6m0h6gKSrsNAV3Nvd6R
oFTT7IvDxu8EMm8j+a07JSyZ/VePyKBfW0XMOfi/delU1wk691xd4OHOCua6Q8pQ
/otiUXEsMnSMSBmTjc97s6H2S4hvwnET/1eu9zPY0R2PzaK5MBOeKezGOLCGxA3l
2wWWADp5xc3Y0CfT8bjpUhzdCYR1UADUhC7vuCzdL7F81RBksL1Zl/ZtxnCYjSIQ
lQXe0x1Xid2rbsgIfaW7oWhcx0muxUbiQI+rLhMFcSRii9XD5YPzhfAg+ATmEn4+
BnVHFGJUeguY8T0OAy3bab/ASTUaN42AXceX7bs2OBOHqeNeGxP/PCjqfnMO4jj2
N5nJheAXpNyPXLUMoX1Vmwz+eQyfxelLZ1BdRwerGPp5zgCMWUT5YvGG/2Yl9WbR
WEXIlDQb3PFJesIBCkqOMRs9IZwsCe07onEX8yspPE+RFFUOcsZIjKo3sL6WbP61
6w1qyJh0wWScZoE7NDxY
=Tepi
-----END PGP SIGNATURE-----

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

[tor-dev] New BridgeDB Distributor (was: Re: New BridgeDB Distributor (Twitter/SocialDistributor intersections, etc.))