Sorry about the truncated log - the full one is gzipped here: http://www.charliebelmer.com/scallion.log.gz
I'll do some Chutney testing - I saw what looked like a good tutorial on this list earlier this month so let me give it a shot.
Thanks!
On Tue, Sep 10, 2013 at 6:17 AM, Karsten Loesing karsten@torproject.orgwrote:
On 9/7/13 5:57 PM, Charlie Belmer wrote:
Hey Karsten & Dev team,
I didn't mean to take this off channel, so my apologies for that.
I ran a few more tests this week (and my apologies on the slowness of
some
of this work - life has been hectic this past month), but I am not
finding
much of use. I could not find any errors which are not also present in
when
N23 is disabled. If you want to browse the logs for yourself, I uploaded
a
partial log from a run to
http://www.charliebelmer.com/scallion.n23.log(730MB).
That log ends at simulated time 0:19:34, so 5:30 minutes before the actual requests start. Can you post both full logs somewhere, maybe gzipped to save bandwidth?
The most telling entries I found so far are below, but I see the same errors whether UseN23 is set to 1 or not, so it is likely not the root cause. I'll keep digging.
Okay.
Another option would be to run your branch in Chutney to find the bug and then return to Shadow for performance measurements. That's what I did with #9166 and it helped a lot. Let me know if you need help with that.
Thanks! Karsten
Error 1 (Seems to occur when opening channel): 0:0:49:406270 [thread-2] 0:9:41:348157618 [scallion-info] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] bw_weights_error_t networkstatus_check_weights(int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int)(): Bw weight mismatch 1. G=214 M=12174 E=165 D=27654
T=40207
Wmd=4769 Wme=-724848 Wmg=0 Wed=461 Wee=734848 Wgd=4769 Wgg=10000 Wme=-724848 Wmg=0 0:0:49:407692 [thread-2] 0:9:41:348157618 [scallion-info] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] bw_weights_error_t networkstatus_check_weights(int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int)(): Bw weight mismatch 1. G=214 M=12174 E=165 D=27654
T=40207
Wmd=4769 Wme=-724848 Wmg=0 Wed=461 Wee=734848 Wgd=4769 Wgg=10000 Wme=-724848 Wmg=0
Error 2: 0:0:54:227263 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void
conn_read_callback(int,
short, void *)(): socket 1000264 wants to read. 0:0:54:227281 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int connection_read_to_buf(connection_t *, ssize_t *, int *)(): 1000264: starting, inbuf_datalen 0 (0 pending in tls object). at_most 16448. 0:0:54:227302 [thread-2] 0:10:55:012000001 [shadow-debug] [4uthority1-73.1.0.0] [_tcp_endOfFileSignalled] 73.1.0.0:10021(descriptor 1000264) <-> 91.1.0.0:9111: signaling close to user, socket no longer
usable
0:0:54:227321 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] TLS error: unexpected close while reading (SSL negotiation finished successfully) 0:0:54:227338 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int tor_tls_read(tor_tls_t
*,
char *, size_t)(): read returned r=0, err=-8 0:0:54:227354 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int connection_read_to_buf(connection_t *, ssize_t *, int *)(): TLS
connection
closed on read. Closing. (Nickname nonexit10, address 91.1.0.0) 0:0:54:227373 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void channel_close_for_error(channel_t *)(): Closing channel 0x7f546d03ce40
due
to lower-layer error 0:0:54:227389 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void channel_change_state(channel_t *, channel_state_t)(): Changing state of channel 0x7f546d03ce40 (global ID 21) from "open" to "closing" 0:0:54:227420 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void channel_remove_from_digest_map(channel_t *)(): Removed channel 0x7f546d03ce40 (global ID 21) from identity map in state closing (4) with digest BBF25A366EA5376B0026CE6EDE3DF536211429BE 0:0:54:227440 [thread-2] 0:10:55:012000001 [shadow-debug] [4uthority1-73.1.0.0] [tcp_close] 73.1.0.0:10021 (descriptor 1000264)
<->
91.1.0.0:9111: user closed connection 0:0:54:227452 [thread-2] 0:10:55:012000001 [shadow-debug] [4uthority1-73.1.0.0] [_tcp_setState] 73.1.0.0:10021 (descriptor
<-> 91.1.0.0:9111: moved from TCP state 'TCPS_CLOSEWAIT' to
'TCPS_LASTACK'
0:0:54:227473 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void connection_mark_for_close_internal_(connection_t *, int, const char *)(): Calling connection_mark_for_close_internal_() on an OR conn at /home/coffee/shadow/shadow/build/tor/src/or/connection.c:2932 0:0:54:227491 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int conn_close_if_marked(int)(): Cleaning up connection (fd -1). 0:0:54:227509 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void circuit_n_chan_done(channel_t *, int)(): chan to NULL/91.1.0.0:9111, status=0 0:0:54:227546 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void channel_change_state(channel_t *, channel_state_t)(): Changing state of channel 0x7f546d03ce40 (global ID 21) from "closing" to "channel error" 0:0:54:227565 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int connection_remove(connection_t *)(): removing socket -1 (type OR),
n_conns
now 26
On Tue, Sep 3, 2013 at 10:47 AM, Karsten Loesing <karsten@torproject.org wrote:
On 9/3/13 4:32 PM, Charlie Belmer wrote:
Hi Karsten,
I have run several tests, but I haven't been able to satisfy myself
yet.
I
attached the results from my latest round of tests for your review. I
am
comparing the n23 branch (UseN23 1 set in each torrc) with the initial
head
I merged n23 with. It seems that for some reason, the n23 branch
completes
far fewer downloads, though I haven't yet found the cause, or found anything interesting in the logs.
$ grep "fg-download-complete" m1.large-tor/data/scallion.log | wc -l 8827 $ grep "fg-download-complete" m1.large-n23/scallion.log | wc -l 308
I would definitely appreciate any thoughts, insights, or advice!
Interesting! Can you re-run the n23 simulation on info or even debug log level? That might reveal some more information. You'll do that by running "scallion --log-level=INFO". A tiny network should be sufficient for that. Watch out for filling disks.
If you want me to look at the scallion.log file, feel free to post it somewhere, and I'll see if something looks unusual to me.
Thanks, Karsten
On Tue, Sep 3, 2013 at 8:58 AM, Karsten Loesing <
karsten@torproject.org
wrote:
Hi Charlie,
did you have the chance to simulate your rebased n23-5 branch using
Shadow?
If not, maybe I can help by simulating it once and writing down more specific instructions for you?
Thanks! Karsten
On 8/8/13 10:36 AM, Karsten Loesing wrote:
(Moving this discussion back to tor-dev@ with Charlie's permission.)
On 8/6/13 4:10 AM, Charlie Belmer wrote: > Hey Karsten, > > I've been reading up on Shadow & Scallion to do some performance
testing on
> N23. Before I go too far down that road, I wanted to check in with
you.
> > Are there existing blueprints or best practices I should be
following?
Thanks for looking into this!
So, I wouldn't call the following notes blueprints or best practices, but they are a description of what I would do to run Shadow
simulations
of your branch.
First of all, be sure to read the Shadow wiki which has contains lots
of
interesting stuff: https://github.com/shadow/shadow/wiki
The next step would be to get a vanilla Tor master (that your branch
is
based on) running in a minimal or tiny Shadow network. It's probably easiest to use a local Ubuntu VM for this. Or you can start an
m1.large
EC2 instance for the tiny network, or an even smaller instance for
the
minimal network.
Once that simulation succeeds, you'll want to run another simulation using your branch, extract performance metrics using Shadow's analyze script, produce a PDF with graphs also using the analyze script, and then wonder if things are faster than before.
The minimal or tiny networks are mostly there to learn that things
are
working as expected. You'll probably want larger networks for real performance results, but starting small to get everything working is probably easier.
Sounds doable, I guess. However, here's where things start to get tricky: in order to simulate Tor master (or a branch based on it) in Shadow, you'll have to perform quite a few extra steps like compiling your own CMake and Clang/LLVM. Not really hard, but you need to get
all
pieces together before Shadow does what you want.
I'm pasting detailed instructions that I used to simulate Steven's
uTP
branch in Shadow below as an example. Not everything applies to your branch, but it should give you the general idea.
If something remains unclear, please feel free to ask!
> Depending how it goes, I'd also look at re-trying adaptive 23 or
tuning
the
> existing implementation.
Neat! We should probably include Mashael in that discussion once you have a working and simulated n23-5 branch. Maybe she has ideas what parts need tweaking. I'm cc'ing her in this mail so she knows what you're working on.
In any case, having a working rebased n23-5 branch is already a big
step
forward.
Thanks! Karsten
=== START OF INSTRUCTIONS ===
Launch new EC2 instance running Ubuntu Server 13.04 64 bit
Set ulimit -n to 25000, see Shadow wiki for details
https://github.com/shadow/shadow/wiki/Preparing-your-machine#system-configs-...
$ sudo apt-get update $ sudo apt-get upgrade
Clone and test-compile modified libutp
$ sudo apt-get install build-essential git automake libssl-dev
libevent-dev
$ mkdir src $ cd src/ $ git clone https://github.com/sjmurdoch/libutp $ cd libutp/ $ make $ cd ../
Clone and test-compile Tor branch utp
$ git clone https://git.torproject.org/tor.git $ cd tor/ $ git remote add sjm217 https://git.torproject.org/sjm217/tor.git $ git fetch sjm217 $ git checkout -b utp sjm217/utp $ ./autogen.sh $ LDFLAGS="-L/home/ubuntu/src/libutp"
CFLAGS="-I/home/ubuntu/src/libutp"
LIBS="-lutp -lrt" ./configure --disable-asciidoc $ make $ make distclean $ cd ../
Modify Tor to use uTP for all links (or "0 &&" for not using uTP for
any
link)
diff --git a/src/or/channeltls.c b/src/or/channeltls.c index 0551b73..b7b36e1 100644 --- a/src/or/channeltls.c +++ b/src/or/channeltls.c @@ -418,7 +418,7 @@ channel_tls_connect(const tor_addr_t *addr,
uint16_t
port, /* Create a uTP connection */ tor_addr_to_sockaddr(addr, port, (struct sockaddr*)&sin,
sizeof(sin));
tor_addr_to_str(addr_str, addr, sizeof(addr_str), 0);
- if (!strncmp(addr_str, "128.232.10.129", sizeof(addr_str))) {
- if (1 || !strncmp(addr_str, "128.232.10.129", sizeof(addr_str))) { log_info(LD_CHANNEL, "Trying uTP connection to %s", addr_str); tlschan->utp = UTP_Create(tor_UTPSendToProc, tlschan, (const
struct
sockaddr*)&sin,
Install CMake 2.8.10
$ wget http://www.cmake.org/files/v2.8/cmake-2.8.10.2.tar.gz $ tar xf cmake-2.8.10.2.tar.gz $ cd cmake-2.8.10.2/ $ ./configure $ make $ export PATH=~/src/cmake-2.8.10.2/bin:$PATH $ cd ../ $ which cmake # result: /home/ubuntu/src/cmake-2.8.10.2/bin/cmake
Build Clang and LLVM from source
$ wget http://llvm.org/releases/3.2/llvm-3.2.src.tar.gz $ wget http://llvm.org/releases/3.2/clang-3.2.src.tar.gz $ tar xaf llvm-3.2.src.tar.gz $ tar xaf clang-3.2.src.tar.gz $ cp -R clang-3.2.src llvm-3.2.src/tools/clang $ cd llvm-3.2.src/ $ mkdir build $ cd build/ $ cmake -DCMAKE_INSTALL_PREFIX=/home/ubuntu/.local ../. $ make $ make install $ export PATH=~/.local/bin/:$PATH $ cd ../../ $ which clang # result: /home/ubuntu/.local/bin//clang
Install Shadow
$ sudo apt-get install gcc xz-utils make automake autoconf cmake tidy libtidy-dev libglib2.0 libglib2.0-dev dstat pdftk python2.7 python-matplotlib python-numpy python-scipy $ git clone https://github.com/shadow/shadow.git $ cd shadow $ ./setup dependencies $ ./setup build -g -i /home/ubuntu/src/libutp -i /usr/include/x86_64-linux-gnu/c++/4.7 -l /home/ubuntu/src/libutp --tor-prefix /home/ubuntu/src/tor --tor-lib utp $ ./setup install $ export PATH=~/.shadow/bin/:$PATH
Simulate tiny network in Shadow
$ cd resource/examples/scallion/ $ tar xf tiny-4GB-m1.large.tar.xz $ cd tiny-4GB-m1.large/ $ scallion
Graph simulation results
Add one line to beginning of analyze.py:
import matplotlib; matplotlib.use('Agg')
For additional traffic simulations:
$ grep -v "[traffic-" scallion.log > scallion-notraffic.log
$ cd ~/src/ $ mkdir results $ cd results/ $ python ~/src/shadow/contrib/analyze.py parse --output uTP-1
~/src/shadow/resource/examples/scallion/tiny-4GB-m1.large/data-utp-1/scallion.log
$ python ~/src/shadow/contrib/analyze.py parse --output uTP-0
~/src/shadow/resource/examples/scallion/tiny-4GB-m1.large/data-utp-0/scallion.log
$ python ~/src/shadow/contrib/analyze.py parse --output vanilla
~/src/shadow/resource/examples/scallion/tiny-4GB-m1.large/data-vanilla/scallion.log
$ python ~/src/shadow/contrib/analyze.py plot --title "sjm217's utp branch (#9166)" --prefix "sjm217-utp" --data vanilla/ "vanilla 0.2.4.4-alpha" --data uTP-1/ "uTP for all links" --data uTP-0/ "uTP
for
none of the links"
=== END OF INSTRUCTIONS ===