Hello guys,
I had a problem and currently I'm not able to solve it. So, here I am ;) I have a python script that uses python-stem to create and handle a tor instance (on a defined port). What it does is retrieving (using a HTTP GET) a web page and submitting information (using HTTP POST messages). Basically i use tor because I need to test this server from different IP addresses with more requests in parallel. What I also do is keeping trace of Cookies. Here's a sample of the code I use, based on the example on stem website https://stem.torproject.org/tutorials/to_russia_with_love.html (to have more parallel requests, i launch the script many times with different socks_port value): ---------------------------- import socket, socks, stem.process import mechanize, cookielib
SOCKS_PORT = 9000 DATA_DIRECTORY = "TOR_%s" % SOCKS_PORT socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', SOCKS_PORT) socket.socket = socks.socksocket
tor_process = stem.process.launch_tor_with_config( config = { 'SocksPort': str(SOCKS_PORT), 'ControlPort': str(SOCKS_PORT+1), 'DataDirectory': DATA_DIRECTORY, 'ExitNodes': '{it}', }, )
# initialize python mechanize, with cookies (it works exactly like urllib2, urllib3, etc. already tried...) br = mechanize.Browser() cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) ...
for number in num_list: req = br.open_novisit("http://example.com") #_1_ res = req.read() print res req.close() req2 = br.open("http://example.com/post_to_me", data_to_post) #_2_ res2 = req2.read() req2.close() --------------------------------
And that's it. The problem occurs on the lines i marked as _1_ and _2_: basically when it reaches around 200 requests, it seems to block undefinitely, waiting for a response that never comes. Of course, wiresharking doesn't work because it's encrypted. The same stuff, without TOR, works perfectly. So, why does it stuck at about 200 requests!? I tried to:
1. Telnet on control port, forcing to renew circuits with SIGNAL NEWNYM 2. instantiating mechanize (urllib2, 3, whatever) in the loop 3. ...i don't remember what else
I thought it could be a local socket connection limit: actually without TOR, i see in wireshark the source port changes every time a request is performed. But actually i don't know if the problem is in using the same source port every time (but i don't think so) and if so, should I close the current socket and open a new one? Should I kill the tor process? I can't exaplain myself why... What I only know is: *when the script stucks, if i kill the python process (ctrl+c) and then re-launch, it starts working again.*. I've seen that it's possible to set the value of TrackHostExitsExpire, is it useful in my case?
Thanks in advance to whoever can help me!! Ed
Hi Eduard. On first glance I'm not aware of any resource that would be exhausted by this after 200 iterations. Does the issue repro if you just do 400 GETs or 400 POSTs, or does the problem only arise when you do a combination of 200 of each? Have you tried running netstat or another connection resolver when it gets stuck to see if you have 400 open connections (that is to say, checking that this is terminating the connections as expected)?
On a side note I'm a little concerned about you running multiple instances of this script to run through multiple relays. 400 requests x N instances would be quite a bit of traffic to dump on the network. In addition, each instance is downloading the full tor consensus and microdescriptors since it does not share a data directory. What exactly is the goal of your script?
Cheers! -Damian
On Thu, Aug 1, 2013 at 10:00 AM, Eduard Natale eduard.natale@gmail.com wrote:
Hello guys,
I had a problem and currently I'm not able to solve it. So, here I am ;) I have a python script that uses python-stem to create and handle a tor instance (on a defined port). What it does is retrieving (using a HTTP GET) a web page and submitting information (using HTTP POST messages). Basically i use tor because I need to test this server from different IP addresses with more requests in parallel. What I also do is keeping trace of Cookies. Here's a sample of the code I use, based on the example on stem website https://stem.torproject.org/tutorials/to_russia_with_love.html (to have more parallel requests, i launch the script many times with different socks_port value):
import socket, socks, stem.process import mechanize, cookielib
SOCKS_PORT = 9000 DATA_DIRECTORY = "TOR_%s" % SOCKS_PORT socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', SOCKS_PORT) socket.socket = socks.socksocket
tor_process = stem.process.launch_tor_with_config( config = { 'SocksPort': str(SOCKS_PORT), 'ControlPort': str(SOCKS_PORT+1), 'DataDirectory': DATA_DIRECTORY, 'ExitNodes': '{it}', }, )
# initialize python mechanize, with cookies (it works exactly like urllib2, urllib3, etc. already tried...) br = mechanize.Browser() cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) ...
for number in num_list: req = br.open_novisit("http://example.com") #_1_ res = req.read() print res req.close() req2 = br.open("http://example.com/post_to_me", data_to_post) #_2_ res2 = req2.read() req2.close()
And that's it. The problem occurs on the lines i marked as _1_ and _2_: basically when it reaches around 200 requests, it seems to block undefinitely, waiting for a response that never comes. Of course, wiresharking doesn't work because it's encrypted. The same stuff, without TOR, works perfectly. So, why does it stuck at about 200 requests!? I tried to:
- Telnet on control port, forcing to renew circuits with SIGNAL NEWNYM
- instantiating mechanize (urllib2, 3, whatever) in the loop
- ...i don't remember what else
I thought it could be a local socket connection limit: actually without TOR, i see in wireshark the source port changes every time a request is performed. But actually i don't know if the problem is in using the same source port every time (but i don't think so) and if so, should I close the current socket and open a new one? Should I kill the tor process? I can't exaplain myself why... What I only know is: *when the script stucks, if i kill the python process (ctrl+c) and then re-launch, it starts working again.*. I've seen that it's possible to set the value of TrackHostExitsExpire, is it useful in my case?
Thanks in advance to whoever can help me!! Ed
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Hi Damian and thanks for your answer. Actually I only need 2 (at most) different IPs at a time from different countries, that's why I need to use 2 instances of my script with separate circuits. I have to test a CGI script on a server of mine and each request is delayed 2 seconds from the next one, so I think this situation shouldn't be an issue for the TOR network. So we don't have 400xN but 1request x 2instances every 2 seconds. In netstat i see at most 4 established connections (about tor) and even socklist gives me the same result.
Btw, let's consider only one instance of the script. It blocks at N~=200 requests (it makes a get first - N/2 in total - and then a post - N/2 in total - both blocking requests). Enabling log, the last message is: [debug] parse_socks(): socks5: ipv4 address type
Previous similar messages were followed by: [debug] connection_ap_handshake_process_socks(): socks handshake not all here yet. #or Client asked for [scrubbed]:80 [debug] connection_ap_handshake_process_socks(): entered. [debug] connection_ap_handshake_process_socks(): socks handshake not all here yet. [debug] conn_write_callback(): socket 64 wants to write. ...
But once i got: [warn] Your application (using socks5 to port 80) is giving Tor only an IP address. Applications that do DNS resolves themselves may leak information. Consider using Socks4A (e.g. via privoxy or socat) instead. For more information, please see https://wiki.torproject.org/TheOnionRouter/TorFAQ#SOCKSAndDNS
So it seems that the script gets stuck here: connection_ap_handshake_rewrite_and_attach()
Thanks a lot Damian, all
Eduard
Il giorno 02/ago/2013 19:41, "Damian Johnson" atagar@torproject.org ha scritto:
Hi Eduard. On first glance I'm not aware of any resource that would be exhausted by this after 200 iterations. Does the issue repro if you just do 400 GETs or 400 POSTs, or does the problem only arise when you do a combination of 200 of each? Have you tried running netstat or another connection resolver when it gets stuck to see if you have 400 open connections (that is to say, checking that this is terminating the connections as expected)?
On a side note I'm a little concerned about you running multiple instances of this script to run through multiple relays. 400 requests x N instances would be quite a bit of traffic to dump on the network. In addition, each instance is downloading the full tor consensus and microdescriptors since it does not share a data directory. What exactly is the goal of your script?
Cheers! -Damian
On Thu, Aug 1, 2013 at 10:00 AM, Eduard Natale eduard.natale@gmail.com wrote:
Hello guys,
I had a problem and currently I'm not able to solve it. So, here I am ;)
I
have a python script that uses python-stem to create and handle a tor instance (on a defined port). What it does is retrieving (using a HTTP
GET)
a web page and submitting information (using HTTP POST messages). Basically i use tor because I need to test this server from different IP addresses with more requests in parallel. What I also do is keeping
trace of
Cookies. Here's a sample of the code I use, based on the example on stem website https://stem.torproject.org/tutorials/to_russia_with_love.html(to have more parallel requests, i launch the script many times with
different
socks_port value):
import socket, socks, stem.process import mechanize, cookielib
SOCKS_PORT = 9000 DATA_DIRECTORY = "TOR_%s" % SOCKS_PORT socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', SOCKS_PORT) socket.socket = socks.socksocket
tor_process = stem.process.launch_tor_with_config( config = { 'SocksPort': str(SOCKS_PORT), 'ControlPort': str(SOCKS_PORT+1), 'DataDirectory': DATA_DIRECTORY, 'ExitNodes': '{it}', }, )
# initialize python mechanize, with cookies (it works exactly like
urllib2,
urllib3, etc. already tried...) br = mechanize.Browser() cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) ...
for number in num_list: req = br.open_novisit("http://example.com") #_1_ res = req.read() print res req.close() req2 = br.open("http://example.com/post_to_me", data_to_post) #_2_ res2 = req2.read() req2.close()
And that's it. The problem occurs on the lines i marked as _1_ and _2_: basically when it reaches around 200 requests, it seems to block undefinitely, waiting for a response that never comes. Of course, wiresharking doesn't work because it's encrypted. The same stuff, without TOR, works perfectly. So, why does it stuck at about 200 requests!? I
tried
to:
- Telnet on control port, forcing to renew circuits with SIGNAL NEWNYM
- instantiating mechanize (urllib2, 3, whatever) in the loop
- ...i don't remember what else
I thought it could be a local socket connection limit: actually without
TOR,
i see in wireshark the source port changes every time a request is performed. But actually i don't know if the problem is in using the same source port every time (but i don't think so) and if so, should I close
the
current socket and open a new one? Should I kill the tor process? I can't exaplain myself why... What I only know is: *when the script stucks, if i kill the python
process
(ctrl+c) and then re-launch, it starts working again.*. I've seen that
it's
possible to set the value of TrackHostExitsExpire, is it useful in my
case?
Thanks in advance to whoever can help me!! Ed
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev