Hi Roger,
Thank you very much for replying to my appeal. I hope that other people who are critical of Stegotorus jump-in along the line.
- Like the FTE paper...
I'm going to discuss your first point at the end because it is not criticism of Stegotorus per se, but of http transport in general.
- There also remains the issue of where you get your
covertexts. While FTE says "we will build a brilliant regexp to characterize the format of the thing we hide our content in" (which has its own problems -- anything your regexp misses is a crack in the armor), Stegotorus says "we will build a big library of example things, by crawling the Internet, and then we'll hide our content in them". Where does this library come from? How does every Stegotorus bridge gets its own library? What happens when you reuse an item in your library? How do *clients* generate their own library? I think there are lots of ways to lose plausibility that haven't been explored.
2') One of the proposed ways for clients to generate their library of plausible covertexts is to basically wiretap the user and then replay her own traffic later with the Tor flow embedded in it. First there are messy engineering questions to tapping the user in a portable way; but I worry even more about the privacy issues introduced by repeating earlier traffic. Also, does it introduce new distinguishing attacks, like "look for variations on the same request"?
So one thing I implemented last summer to address 2 and 2' at least on plausibility level is this: - To setup, ST-server, you give it the address of an http server and list of files stored on that server (alternatively you are running Apache httpd on the same machine and ST-Server will go and make a list of all files in Apache document_root). - The client request the "http://ST-server/". - In reply ST-server will send the hash of the list of files - ST-client compare with the hash of its own file list (which can be zero if the client has no list). - If it matches they starts serving right away. - If not, ST-server will send the list of file to the client. (these all happens inside the steg layer of course).
Moreover, now not only the steg modules are pluggable, the payload (covertext) loader is also pluggable, so if you don't like this way of loading payloads you can write your payload loader and ask ST to use that instead.
I recognize that *not* using real client traffic also allows problems, e.g. "why is that user, who usually uses IE, sending a user-agent of chromium?"
This can be address as a ticket. CURL already has the option to pretend being other agents. You can just ask the user to browse 127.0.0.1:ST-Port and get the agent that way.
After all I don't think, this is as a critical issue. First most people are sharing valid IPs. (I can't remember one place in Iran that I got a valid IP) So the filter has no way to say if two people are using the connection, or one person using two browsers. Let alone that, it is not uncommon to use two browsers on a same computer.
The other thing is that HTTP GET request doesn't have lots of place to maneuver so it is much easier to make it exactly look like a specific browser.
Long term we can make a Firefox plug-in to interact with ST on the client side. It shouldn't be hard because the payload provider is pluggable now (which also take care of the communication on the client side).
- What's the overhead of putting your Tor traffic through each of the
steg modules? It's my understanding that some of the Stegotorus steg modules produce immense size overhead (since the cover-item is large, and the part of the cover-item you can hide your message in is relatively small).
Well, this is a usual efficiency/security trade off question. What would you tell somebody who says "but I browse the web much faster without tor...", The answer is if you have idea to provide the same service with higher efficiency then go for it. Between no access or inefficient access, personally, prefer the latter. If Obfproxy can get by, I'm not going to use ST, but what if not.
Practically speaking, I have watched the entire "behind enemy lines" of you and Jacob using ST using a quite good connection, and beside 2 or 3 times that it stopped, I had a OK experience watching it.
I can come up with exact value of the overhead (traffic size in bytes with st)/(traffic size without st) if you give me some time.
What are the numbers for the current steg modules that people are talking about / have built?
There's still 3/4 of them, pdf, swf, js and js in html. Adding steg module, seems to be the least of the problems and the most out source-able task. It is relatively easy to take an existing steg algorithm and feed it to ST as a steg module.
Is there some correlation between inefficiency (overhead) and plausibility (indistinguishability)?
Obviously. You can choose the "no-steg" steg module and then you won't have any overhead.
What are the tradeoffs if we adopt some sort of "choose the covertext from your library that minimizes your overhead" policy?
Stegotorus, already has something like that, it consider 10 random candidates and between them it chooses the one with the least overhead that offer the capacity requested by the chopper.
It is matter of few lines of codes to say only use xx% of least overhead covertext (size/capacity which payload loader is aware of both).
Other possible approach to this is to ask user for a security factor (between 0-1) and based on that the steg module knows how much of change is allowed to be done to a covertext.
Writing a simple classifier (using scikit-learn e.g.) which computes simple stat features to test the effectiveness of the security factor (telling http from st) isn't a terribly ambitious task either.
- And then the last issue isn't so much a design issue as a community
or resource issue -- Zack is busy being a student,
I think by now, I have made change to all part of the code beside the crypto module and I know the code quite well. Though I have the same problem as Zack, but I'm expecting to graduate soon. But if we tell every potential contributor don't touch ST it will burn, then the community won't grow.
Anyway, I hope putting all above points on the trac as tickets, probably is the first step. If ST get a page with a short explanation and a link to its ticket (like Flashproxy does) maybe attracts more attention.
and further development by SRI is complicated by their pub review requirement (which alas applies to their code contributions too).
I don't know about this "pub review requirement". Could you give a ref to read more about it?
And now item 1.
- Like the FTE paper
(https://www.torproject.org/docs/pluggable-transports), the main contribution of Stegotorus is to provide a framework for plugging in steg modules. There are several example steg modules to choose from. The idea is that even if the ones they offer now aren't suitable, if you *had* a good one, you could just pop it in. The trouble is that I don't know of any good ones, and I think that's a harder problem than people think.
I think the actual question is that "are we going to provide http transport or not?" I think the situation that "I'm going to close many ports beside 80 and X and Y, then do a simple DPI so people don't divert their https on 80" is a very likely scenario.
The problem that you think is "a harder problem than people think" is the secure steg problem, which we are not trying to solve here. We are banking on the fact that a accurate multilevel stat analysis is too expensive for a DPI that needs to judges GBs of data per second.
In expense of efficiency you can make the steg quite hard to be detected. Suppose you only use jpeg images and you only use the LSB of the cosine coefficients. I don't see any easy way to detect the steg for a DPI on the go.
I think having some thorough explorations of 1-3 would put us in a much better position.
In nutshell, ST doesn't stop you from writing better steg/payload loaders so that is for 2,2' and 3 while providing a working version of them right now. Item 1, is asking "if a http transport is worth it", I think the answer is yes.
Thanks again, I'm going to make some tickets for more concrete aspects of above point and share them with the list.
Bests, Vmon
Message: 2 Date: Sun, 13 Jan 2013 23:47:44 -0700 From: Bin Wang binwang.cu@gmail.com To: tor-dev@lists.torproject.org Subject: [tor-dev] Multiple Tor Message-ID: CAJHCcVbzQcXkQZurORupLLEJBm9cJATkKqYOYkUr=km-bo34pA@mail.gmail.com Content-Type: text/plain; charset="iso-8859-1"
Dear Guys,
I am brand new to TOR and I feel like multiple TORs should be considered. The multiple tors I mentioned here are not only multiple instances, but also using different proxy ports for each, like what has been done here http://www.howtoforge.com/ultimate-security-proxy-with-tor)
I am trying to get started with 4 tors. However, the tutorial applies to Arch Linux and I am using a headless EC2 ubuntu 64bits. It is really a pain going through the differences between Arch and Ubuntu. And here I am wondering is there anyone could offer some help to implement my idea simplicitly.
- Four TORs running at the same time each with an individual port,
privoxy or polipo or whatever are ok once it works. Like: 8118 <- Privoxy <- TOR <- 9050 8129 <- Privoxy <- TOR <- 9150 8230 <- Privoxy <- TOR <- 9250 8321 <- Privoxy <- TOR <- 9350
- In this way, if I try to return the ip of 127.0.0.1:8118, 8129,
8230 and 8321, they should return four different ips, which indicates there are four different tors running at the same time. Then, a few minutes later, check again, all four of them should have a new ips again.
I know my simple 'dream' could come true in many ways, however... I am not only new to tor, but even also to bash and python... That is why I come here and see whether some of you could light me up.
These links might be useful:
http://blog.databigbang.com/distributed-scraping-with-multiple-tor-circuits/ https://www.torservers.net/wiki/setup/server#multiple_tor_processes http://www.howtoforge.com/ultimate-security-proxy-with-tor Best, Bin Wang
On Wed, Jan 16, 2013 at 01:23:03AM -0700, vmonmoonshine@gmail.com wrote:
If ST get a page with a short explanation and a link to its ticket (like Flashproxy does) maybe attracts more attention.
We should definitely put a page like that up. Let us know if we can do anything to help! :)
(asn or I or others can do the website commit)
I think we (you? :) should work on a TBB bundle (like the other pluggable transport bundles people are working on) that ships the simplest most efficient steg module(s) we've got, so Stegotorus can get some actual users and start collecting usability/crash/etc bug reports.
I bet Alexandre et al would be happy to bundle in another thing, with their current obfsproxy-and-flashproxy bundle.
How big an issue is https://trac.torproject.org/projects/tor/ticket/7153 to such a combined bundle?
and further development by SRI is complicated by their pub review requirement (which alas applies to their code contributions too).
I don't know about this "pub review requirement". Could you give a ref to read more about it?
Their relationship with their funder means that the funder wants 30 days of warning before they publish anything (papers or source code). Also, the funder can tell them not to publish something (though in practice they probably won't ever tell them that).
Fortunately, Tor has no such constraint (if we did, we would probably try to arrange it so nothing we publish was written "because of" the funding, and turn down the funding if that failed).
--Roger