Hi everyone!
Been quite sometime since the last update but if one wants to see the
details in between one could go to the DIAL blogs for the project[1].
As of now, we do have a working project with the following details
implemented [2] and further the dotted[consensus module], idea taken from
Senser paper[3] haven't been implemented yet, but hopefully I'll implement
it within a week or two at most. I personally was tilted towards the
similarity of the structure but after some discussions with woswos and
Micah Sherr[4], I've thought of implementing the content based approach
too.
I'll briefly describe both the methods below:
+ Structure of the website: This was thought of because we don't really
know what various changes would be there for a website. More specifically
would be useful for dynamic websites, websites with language based on
geolocation (Geotargeting). But I have to use a filter list and statistical
method to approach the problem.
+ Content based Approach: Compares the content of the HTML data using tree
like structure and hashes to know how the structure is different or
similar. Usage of proxies of the same locations as vantage points to get
better results.
That said, the above mentioned methods are used for the case where
websites partially block tor. One good example for this case would be
https://dan.me.uk/ which doesn't block tor exit relay nodes completely, but
gives an error page (partial block) and no error HTTP response code. The
checking of the HTTP response codes being a low-hanging-fruitish algorithm
is our first step which is seen performing good and might sometimes result
in false positives (Says a website like https://cloudflare.com to be
blocked completely, when it returns captcha or is partially blocked).
Further for the demo purpose, one can refer to the Experimental code[5] and
it's log[6] (Isn't much of a good code and is a bit old but wrote to serve
the purpose of backing up the first method (Structure of the website)).
Also one could look into the `Analyzer.py`[7,8] which would contain the
most recent and improved logic to the analysis. Hope to improve it with
every passing day. I also plan to create a FAQ[9] page which would have
excerpts of discussions or answers to as why a following approach was taken.
Thanks,
Apratim
(irc: _ranchak_)
** Looking forward for suggestions and comments as to how to improve on it.
Also materials like research paper in this domain would be helpful **
References:
[1]
https://hub.osc.dial.community/t/tor-project-alexa-top-sites-captcha-and-bl…
[2]
https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/wikis/GSoC-2021#upda…
[3] http://people.cs.georgetown.edu/~wzhou/publication/senser-acsac13.pdf
[4] https://seclab.cs.georgetown.edu/msherr/
[5]
https://github.com/Hackhard/Fetcher/blob/b9f2fa8d09061862cf954537cbaad7921d…
[6]
https://raw.githubusercontent.com/Hackhard/Fetcher/main/status%20code/test_…
[7] Consensus_lite branch:
https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/blob/consensus_lite/…
[8] Master branch:
https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/blob/master/src/capt…
[9]
https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/wikis/GSoC-2021/Faqs
Hi Everyone,
As part of our Sponsor 30 work, we are looking to improve the new
about:torconnect experience by adding automatic tor settings
configuration for censorship circumvention.
This document outlines and discusses the *technical* challenges
associated with this work, and does not go into any great detail on the
right UX would be (in terms of easy of use, user trust, etc).
Anyway, if you see any pitfalls or problems with anything here, do let
us know.
------------------------------8<--------------------------------------
# Mostly Automatic Bridge Configuration to Bypass Internet Censorship
Our goal for this work is to enable Tor Browser users to access tor
without having to navigate to about:preferences#tor to configure
bridges. Technically speaking, this is a trivial problem assuming you know:
- which bridge settings work at the user's location
- the location of the user
## Circumvention Settings Map
For now, it seems sufficient to maintain a map of countries to some
data-structure containing information about which censorship
circumvention techniques work and which ones do not. A proposed example
format can be found here:
-
https://gitlab.torproject.org/tpo/anti-censorship/state-of-censorship/-/blo…
This map would at be distributed and updated through tor-browser releases.
### Problems
#### Censorship Changes Invalidate the Map
The obvious problem with distributing the censorship-circumvention
settings map with Tor Browser is that if the techniques used in a
location change such that old settings no longer work, you will be left
with a non-functional Tor Browser with no way to update it apart from
acquiring a fresh install with the updated settings or by manually
configuring Tor Browser's bridge settings (so what users have to do now)
A fix for this would be to provide a rules update mechanism whereby
updated rules could be fetched outside of tor (via the clearnet, or over
moat). Special care would need to be taken to ensure the rule updates
from this automatic mechanism actually came from the Tor Project (via
some sort of signature verification scheme, for example).
Another wrinkle here is that rules would also need to be distributed
somewhere that is difficult to censor. It seems likely that we may need
different locations and mechanisms for acquiring the rule-set based on
the user's location.
Whatever the mechanism, updates should happen at least before the user
attempts to auto-configure. Otherwise, perhaps we should periodically
auto-update the the settings at a reasonable cadence.
#### Time Investment to Update Map
Another problem with solely distributing the rules through Tor Browser,
is that censorship events would now require a Tor Browser release just
to push new rules out to people. Publishing new Tor Browser releases is
not a simple task, and enabling adversaries to force Tor Browser
releases by tweaking their censorship systems seems like a cute way to
DDOS the Applications team.
An alternate update channel is definitely necessary outside of periodic
Tor Browser releases.
#### Are Per-Country Entries Granular Enough?
One could imagine highly localized censorship events occurring which
require special settings that are not needed in the rest of the country.
For instance, if there is a clearnet blackout in Minneapolis, would we
want to pipe *all* of our US users through the same bridges? Seems like
a potential scalability problem for countries with large populations.
## Determining User Location
A user's location can be determined by accessing location services
through the clearnet. Mozilla offers a such a service (
https://location.services.mozilla.com/ ) with a very simple HTTP
interface. Prior to bootstrapping, Tor Browser can access the location
service by temporarily enabling network DNS:
- network.dns.disabled=false
and making an exception for the location service URL to bypass the proxy by:
- network.proxy.no_proxies_on="location.services.mozilla.com"
The location service would send back a country code in a JSON object
which we can use to look up appropriate bridge settings in our map
described above.
### Problems
So the functionality of this approach is pretty easy to implement: tweak
some prefs, make an XMLHttpRequest, change the prefs back.
One possible problem we may face is if censors start blocking Mozilla's
location services. Maybe we should have a pool of location service
providers to make this more difficult (though we would need to do the
research and figure out how feasible this is from a cost perspective).
It is also possible to add location service functionality to moat,
though this would also be a bit of an engineering endeavor.
If we move forward with Mozilla's location services, we will need to
acquire an API key, but I would not expect this to be an issue. We will
also need to make arrangements with them to surpass the current limit of
100,000 daily API requests ( see:
https://location.services.mozilla.com/terms )
The big challenge here is engineering the right UX which maintains our
users trust. I think we need to be very explicit with this convenience
feature, and definitely not just have it silently happen in the
background. Users should also be able to opt-out, and manually select
their country for the purposes of getting the right settings out of the
above mentioned map.
It should be very difficult to accidentally enable this automatic
lookup. This will likely require a fair bit of iteration on the
about:torconnect page design and flow.