Hey Karsten & Dev team,

I didn't mean to take this off channel, so my apologies for that.

I ran a few more tests this week (and my apologies on the slowness of some of this work - life has been hectic this past month), but I am not finding much of use. I could not find any errors which are not also present in when N23 is disabled. If you want to browse the logs for yourself, I uploaded a partial log from a run to http://www.charliebelmer.com/scallion.n23.log (730MB).

The most telling entries I found so far are below, but I see the same errors whether UseN23 is set to 1 or not, so it is likely not the root cause. I'll keep digging.

Error 1 (Seems to occur when opening channel):
0:0:49:406270 [thread-2] 0:9:41:348157618 [scallion-info] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] bw_weights_error_t networkstatus_check_weights(int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int)(): Bw weight mismatch 1. G=214 M=12174 E=165 D=27654 T=40207 Wmd=4769 Wme=-724848 Wmg=0 Wed=461 Wee=734848 Wgd=4769 Wgg=10000 Wme=-724848 Wmg=0
0:0:49:407692 [thread-2] 0:9:41:348157618 [scallion-info] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] bw_weights_error_t networkstatus_check_weights(int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int)(): Bw weight mismatch 1. G=214 M=12174 E=165 D=27654 T=40207 Wmd=4769 Wme=-724848 Wmg=0 Wed=461 Wee=734848 Wgd=4769 Wgg=10000 Wme=-724848 Wmg=0

Error 2:
0:0:54:227263 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void conn_read_callback(int, short, void *)(): socket 1000264 wants to read.
0:0:54:227281 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int connection_read_to_buf(connection_t *, ssize_t *, int *)(): 1000264: starting, inbuf_datalen 0 (0 pending in tls object). at_most 16448.
0:0:54:227302 [thread-2] 0:10:55:012000001 [shadow-debug] [4uthority1-73.1.0.0] [_tcp_endOfFileSignalled] 73.1.0.0:10021 (descriptor 1000264) <-> 91.1.0.0:9111: signaling close to user, socket no longer usable
0:0:54:227321 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] TLS error: unexpected close while reading (SSL negotiation finished successfully)
0:0:54:227338 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int tor_tls_read(tor_tls_t *, char *, size_t)(): read returned r=0, err=-8
0:0:54:227354 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int connection_read_to_buf(connection_t *, ssize_t *, int *)(): TLS connection closed on read. Closing. (Nickname nonexit10, address 91.1.0.0)
0:0:54:227373 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void channel_close_for_error(channel_t *)(): Closing channel 0x7f546d03ce40 due to lower-layer error
0:0:54:227389 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void channel_change_state(channel_t *, channel_state_t)(): Changing state of channel 0x7f546d03ce40 (global ID 21) from "open" to "closing"
0:0:54:227420 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void channel_remove_from_digest_map(channel_t *)(): Removed channel 0x7f546d03ce40 (global ID 21) from identity map in state closing (4) with digest BBF25A366EA5376B0026CE6EDE3DF536211429BE
0:0:54:227440 [thread-2] 0:10:55:012000001 [shadow-debug] [4uthority1-73.1.0.0] [tcp_close] 73.1.0.0:10021 (descriptor 1000264) <-> 91.1.0.0:9111:  user closed connection
0:0:54:227452 [thread-2] 0:10:55:012000001 [shadow-debug] [4uthority1-73.1.0.0] [_tcp_setState] 73.1.0.0:10021 (descriptor 1000264) <-> 91.1.0.0:9111: moved from TCP state 'TCPS_CLOSEWAIT' to 'TCPS_LASTACK'
0:0:54:227473 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void connection_mark_for_close_internal_(connection_t *, int, const char *)(): Calling connection_mark_for_close_internal_() on an OR conn at /home/coffee/shadow/shadow/build/tor/src/or/connection.c:2932
0:0:54:227491 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int conn_close_if_marked(int)(): Cleaning up connection (fd -1).
0:0:54:227509 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void circuit_n_chan_done(channel_t *, int)(): chan to NULL/91.1.0.0:9111, status=0
0:0:54:227546 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void channel_change_state(channel_t *, channel_state_t)(): Changing state of channel 0x7f546d03ce40 (global ID 21) from "closing" to "channel error"
0:0:54:227565 [thread-2] 0:10:55:012000001 [scallion-debug] [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int connection_remove(connection_t *)(): removing socket -1 (type OR), n_conns now 26




On Tue, Sep 3, 2013 at 10:47 AM, Karsten Loesing <karsten@torproject.org> wrote:
On 9/3/13 4:32 PM, Charlie Belmer wrote:
> Hi Karsten,
>
> I have run several tests, but I haven't been able to satisfy myself yet. I
> attached the results from my latest round of tests for your review. I am
> comparing the n23 branch (UseN23 1 set in each torrc) with the initial head
> I merged n23 with. It seems that for some reason, the n23 branch completes
> far fewer downloads, though I haven't yet found the cause, or found
> anything interesting in the logs.
>
> $ grep "fg-download-complete" m1.large-tor/data/scallion.log | wc -l
> 8827
> $ grep "fg-download-complete" m1.large-n23/scallion.log | wc -l
> 308
>
> I would definitely appreciate any thoughts, insights, or advice!

Interesting!  Can you re-run the n23 simulation on info or even debug
log level?  That might reveal some more information.  You'll do that by
running "scallion --log-level=INFO".  A tiny network should be
sufficient for that.  Watch out for filling disks.

If you want me to look at the scallion.log file, feel free to post it
somewhere, and I'll see if something looks unusual to me.

Thanks,
Karsten


> On Tue, Sep 3, 2013 at 8:58 AM, Karsten Loesing <karsten@torproject.org>wrote:
>
>> Hi Charlie,
>>
>> did you have the chance to simulate your rebased n23-5 branch using Shadow?
>>
>> If not, maybe I can help by simulating it once and writing down more
>> specific instructions for you?
>>
>> Thanks!
>> Karsten
>>
>>
>> On 8/8/13 10:36 AM, Karsten Loesing wrote:
>>> (Moving this discussion back to tor-dev@ with Charlie's permission.)
>>>
>>> On 8/6/13 4:10 AM, Charlie Belmer wrote:
>>>> Hey Karsten,
>>>>
>>>> I've been reading up on Shadow & Scallion to do some performance
>> testing on
>>>> N23. Before I go too far down that road, I wanted to check in with you.
>>>>
>>>> Are there existing blueprints or best practices I should be following?
>>>
>>> Thanks for looking into this!
>>>
>>> So, I wouldn't call the following notes blueprints or best practices,
>>> but they are a description of what I would do to run Shadow simulations
>>> of your branch.
>>>
>>> First of all, be sure to read the Shadow wiki which has contains lots of
>>> interesting stuff: https://github.com/shadow/shadow/wiki
>>>
>>> The next step would be to get a vanilla Tor master (that your branch is
>>> based on) running in a minimal or tiny Shadow network.  It's probably
>>> easiest to use a local Ubuntu VM for this.  Or you can start an m1.large
>>> EC2 instance for the tiny network, or an even smaller instance for the
>>> minimal network.
>>>
>>> Once that simulation succeeds, you'll want to run another simulation
>>> using your branch, extract performance metrics using Shadow's analyze
>>> script, produce a PDF with graphs also using the analyze script, and
>>> then wonder if things are faster than before.
>>>
>>> The minimal or tiny networks are mostly there to learn that things are
>>> working as expected.  You'll probably want larger networks for real
>>> performance results, but starting small to get everything working is
>>> probably easier.
>>>
>>> Sounds doable, I guess.  However, here's where things start to get
>>> tricky: in order to simulate Tor master (or a branch based on it) in
>>> Shadow, you'll have to perform quite a few extra steps like compiling
>>> your own CMake and Clang/LLVM.  Not really hard, but you need to get all
>>> pieces together before Shadow does what you want.
>>>
>>> I'm pasting detailed instructions that I used to simulate Steven's uTP
>>> branch in Shadow below as an example.  Not everything applies to your
>>> branch, but it should give you the general idea.
>>>
>>> If something remains unclear, please feel free to ask!
>>>
>>>> Depending how it goes, I'd also look at re-trying adaptive 23 or tuning
>> the
>>>> existing implementation.
>>>
>>> Neat!  We should probably include Mashael in that discussion once you
>>> have a working and simulated n23-5 branch.  Maybe she has ideas what
>>> parts need tweaking.  I'm cc'ing her in this mail so she knows what
>>> you're working on.
>>>
>>> In any case, having a working rebased n23-5 branch is already a big step
>>> forward.
>>>
>>> Thanks!
>>> Karsten
>>>
>>>
>>> === START OF INSTRUCTIONS ===
>>>
>>> Launch new EC2 instance running Ubuntu Server 13.04 64 bit
>>>
>>> Set ulimit -n to 25000, see Shadow wiki for details
>>>
>> https://github.com/shadow/shadow/wiki/Preparing-your-machine#system-configs-and-limits
>>>
>>> $ sudo apt-get update
>>> $ sudo apt-get upgrade
>>>
>>> Clone and test-compile modified libutp
>>>
>>> $ sudo apt-get install build-essential git automake libssl-dev
>> libevent-dev
>>> $ mkdir src
>>> $ cd src/
>>> $ git clone https://github.com/sjmurdoch/libutp
>>> $ cd libutp/
>>> $ make
>>> $ cd ../
>>>
>>> Clone and test-compile Tor branch utp
>>>
>>> $ git clone https://git.torproject.org/tor.git
>>> $ cd tor/
>>> $ git remote add sjm217 https://git.torproject.org/sjm217/tor.git
>>> $ git fetch sjm217
>>> $ git checkout -b utp sjm217/utp
>>> $ ./autogen.sh
>>> $ LDFLAGS="-L/home/ubuntu/src/libutp" CFLAGS="-I/home/ubuntu/src/libutp"
>>> LIBS="-lutp -lrt" ./configure --disable-asciidoc
>>> $ make
>>> $ make distclean
>>> $ cd ../
>>>
>>> Modify Tor to use uTP for all links (or "0 &&" for not using uTP for any
>>> link)
>>>
>>> diff --git a/src/or/channeltls.c b/src/or/channeltls.c
>>> index 0551b73..b7b36e1 100644
>>> --- a/src/or/channeltls.c
>>> +++ b/src/or/channeltls.c
>>> @@ -418,7 +418,7 @@ channel_tls_connect(const tor_addr_t *addr, uint16_t
>>> port,
>>>    /* Create a uTP connection */
>>>    tor_addr_to_sockaddr(addr, port, (struct sockaddr*)&sin, sizeof(sin));
>>>    tor_addr_to_str(addr_str, addr, sizeof(addr_str), 0);
>>> -  if (!strncmp(addr_str, "128.232.10.129", sizeof(addr_str))) {
>>> +  if (1 || !strncmp(addr_str, "128.232.10.129", sizeof(addr_str))) {
>>>      log_info(LD_CHANNEL,
>>>               "Trying uTP connection to %s", addr_str);
>>>      tlschan->utp = UTP_Create(tor_UTPSendToProc, tlschan, (const struct
>>> sockaddr*)&sin,
>>>
>>> Install CMake 2.8.10
>>>
>>> $ wget http://www.cmake.org/files/v2.8/cmake-2.8.10.2.tar.gz
>>> $ tar xf cmake-2.8.10.2.tar.gz
>>> $ cd cmake-2.8.10.2/
>>> $ ./configure
>>> $ make
>>> $ export PATH=~/src/cmake-2.8.10.2/bin:$PATH
>>> $ cd ../
>>> $ which cmake # result: /home/ubuntu/src/cmake-2.8.10.2/bin/cmake
>>>
>>> Build Clang and LLVM from source
>>>
>>> $ wget http://llvm.org/releases/3.2/llvm-3.2.src.tar.gz
>>> $ wget http://llvm.org/releases/3.2/clang-3.2.src.tar.gz
>>> $ tar xaf llvm-3.2.src.tar.gz
>>> $ tar xaf clang-3.2.src.tar.gz
>>> $ cp -R clang-3.2.src llvm-3.2.src/tools/clang
>>> $ cd llvm-3.2.src/
>>> $ mkdir build
>>> $ cd build/
>>> $ cmake -DCMAKE_INSTALL_PREFIX=/home/ubuntu/.local ../.
>>> $ make
>>> $ make install
>>> $ export PATH=~/.local/bin/:$PATH
>>> $ cd ../../
>>> $ which clang # result: /home/ubuntu/.local/bin//clang
>>>
>>> Install Shadow
>>>
>>> $ sudo apt-get install gcc xz-utils make automake autoconf cmake tidy
>>> libtidy-dev libglib2.0 libglib2.0-dev dstat pdftk python2.7
>>> python-matplotlib python-numpy python-scipy
>>> $ git clone https://github.com/shadow/shadow.git
>>> $ cd shadow
>>> $ ./setup dependencies
>>> $ ./setup build -g -i /home/ubuntu/src/libutp -i
>>> /usr/include/x86_64-linux-gnu/c++/4.7 -l /home/ubuntu/src/libutp
>>> --tor-prefix /home/ubuntu/src/tor --tor-lib utp
>>> $ ./setup install
>>> $ export PATH=~/.shadow/bin/:$PATH
>>>
>>> Simulate tiny network in Shadow
>>>
>>> $ cd resource/examples/scallion/
>>> $ tar xf tiny-4GB-m1.large.tar.xz
>>> $ cd tiny-4GB-m1.large/
>>> $ scallion
>>>
>>> Graph simulation results
>>>
>>> Add one line to beginning of analyze.py:
>>>
>>> import matplotlib; matplotlib.use('Agg')
>>>
>>> For additional traffic simulations:
>>>
>>> $ grep -v "\[traffic-" scallion.log > scallion-notraffic.log
>>>
>>> $ cd ~/src/
>>> $ mkdir results
>>> $ cd results/
>>> $ python ~/src/shadow/contrib/analyze.py parse --output uTP-1
>>>
>> ~/src/shadow/resource/examples/scallion/tiny-4GB-m1.large/data-utp-1/scallion.log
>>> $ python ~/src/shadow/contrib/analyze.py parse --output uTP-0
>>>
>> ~/src/shadow/resource/examples/scallion/tiny-4GB-m1.large/data-utp-0/scallion.log
>>> $ python ~/src/shadow/contrib/analyze.py parse --output vanilla
>>>
>> ~/src/shadow/resource/examples/scallion/tiny-4GB-m1.large/data-vanilla/scallion.log
>>> $ python ~/src/shadow/contrib/analyze.py plot --title "sjm217's utp
>>> branch (#9166)" --prefix "sjm217-utp" --data vanilla/ "vanilla
>>> 0.2.4.4-alpha" --data uTP-1/ "uTP for all links" --data uTP-0/ "uTP for
>>> none of the links"
>>>
>>> === END OF INSTRUCTIONS ===
>>>
>>>
>>
>>
>