Testing in Tor [was Re: Brainstorming a Tor censorship analysis tool]

List overview All Threads
Download

newer

older

Brainstorming a Tor censorship...

Stem code review 2012-12-21

Nick Mathewson

19 Dec 2012 19 Dec '12

9:49 p.m.

On Wed, Dec 19, 2012 at 2:29 PM, Simon simonhf@gmail.com wrote: [...]

...

Maybe there is no automated testing for any Tor projects? At least a quick search on the wiki only found [1] which lists possible ways to test (but was created 7 months ago and apparently not updated since and collecting dust) and [2] discussing a manual test procedure for TBB. However, tor-0.2.3.25.tar.gz does reveal some test files but the source code ratio of production code to test code is not inspiring at first glance:

[...]

Be aware that we've also been using 'chutney' and 'experimentor' for integration testing. They supplement coverage a bit, though they need more tests, and each tends to hide certain classes of error.

...

Tor seems to have good planning compared to most open source projects. So I would be interested in hearing why testing is apparently 'falling between the cracks'. Why isn't there just 10 times more test LOC?

Not because of any hatred or disapproval of tests--just because nobody's written that 100 kloc of testing code yet.

I think that's for two main reasons: * We were in a hurry when we wrote lots of the stuff we wrote. * Large parts of the codebase have been written in a tightly coupled style that needs refactoring before it can be tested without a live Tor network at hand. * Until Tor 0.2.2, our testing framework didn't let us have tests that touched global state, which made our tests in general pretty fragile.

...

What about implementing a new policy immediately: Any new production LOC committed must be covered by tests, or peer reviewed and democratically excluded?

Goodness knows we need more review and testing.

It doesn't seem fair to reject patches for lacking test coverage when they are patches to code that itself lacks coverage, though. If you write a 5-line patch to connection_ap_rewrite_and_attach, for instance, you probably shouldn't be on the hook for refactoring the whole function to make it testable, though you will be hard-pressed to write any tests for that monster without a complete refactor.

It might be a reasonable goal to try to set a plan for increasing test coverage by a certain percentage with each release.

If you like and you have time, it would be cool to stop by the tickets on trac.torproject.org for milestone "Tor: 0.2.4.x-final" in state "needs_review" and look to see whether you think any of them have code that would be amenable to new tests, or to look through currently untested functions and try to figure out how to make more of them tested and testable.

yrs, -- Nick

Show replies by date

Simon

19 Dec 19 Dec

10:45 p.m.

On Wed, Dec 19, 2012 at 1:49 PM, Nick Mathewson nickm@alum.mit.edu wrote:

...

On Wed, Dec 19, 2012 at 2:29 PM, Simon simonhf@gmail.com wrote: [...]

...
Tor seems to have good planning compared to most open source projects. So I would be interested in hearing why testing is apparently 'falling between the cracks'. Why isn't there just 10 times more test LOC?

Not because of any hatred or disapproval of tests--just because nobody's written that 100 kloc of testing code yet.

I think that's for two main reasons:

We were in a hurry when we wrote lots of the stuff we wrote.

I'm not trying to start a flame war but this sounds similar to excuses from fat people who don't want to do exercise :-) So I'm just going to pretend you never wrote this :-)

...

Large parts of the codebase have been written in a tightly coupled

style that needs refactoring before it can be tested without a live Tor network at hand.

Much automated (unit) testing is done my mocking data structures used by functions and/or mocking functions used by functions. This is possible even with tight coupling.

Personally I think the most effective way to test with code coverage is to test at the system/integration level to get the majority of low-hanging-fruit coverage, and then make up the rest of the coverage with more complicated-to-write unit testing. For the system/integration level testing then it would be great to actually start up a complete test Tor network e.g. on localhost, containing all the components necessary for end-to-end testing using real UDP & TCP traffic. Maybe some of the production code isn't beneficial for such an end-to-end automated test right now, but that's the beauty of developers writing their own tests; they can change the production code to make it more easily support such activities. With an end-to-end Tor network test then I would guess that 'happy path' coverage would jump up to somewhere between 40% and 60%. At least numbers in this range are what I have seen with other projects.

...

Until Tor 0.2.2, our testing framework didn't let us have tests

that touched global state, which made our tests in general pretty fragile.

...
What about implementing a new policy immediately: Any new production LOC committed must be covered by tests, or peer reviewed and democratically excluded?

Goodness knows we need more review and testing.

It doesn't seem fair to reject patches for lacking test coverage when they are patches to code that itself lacks coverage, though. If you write a 5-line patch to connection_ap_rewrite_and_attach, for instance, you probably shouldn't be on the hook for refactoring the whole function to make it testable, though you will be hard-pressed to write any tests for that monster without a complete refactor.

I agree with you that it seems unfair. But the alternative is systematically writing tests to cover all code which is unrealistic and will never happen. There is no other alternative, or? The developer who submits the patch has already comprehended the code in question and is therefore in an excellent position to create the necessary automated tests. Even if the patch comes without tests then presumably the person reviewing and integrating and patch can start a discussion and/or add the test for coverage themselves if necessary.

...

It might be a reasonable goal to try to set a plan for increasing test coverage by a certain percentage with each release.

One thing is for sure, coverage figures won't improve much without developer discipline :-( I've also seen teams where coverage is enforced, but only prior to releasing and coverage expectations is set below 100%, e.g. at 70% to 90%. Personally I think this is not good for a bunch of reasons: The code being covered is long forgotten by the developer and therefore the test code takes unnecessarily longer. The developers/testers doing the test code will just go for the low hanging fruit, just to get the coverage numbers up. Having to go back and revisit code just seems like a chore and makes the word coverage in the team seem like a dirty word instead of the joyous word which is should be :-) It's a case of 'Look after the pennies and the pounds will look after themselves' :-)

...

If you like and you have time, it would be cool to stop by the tickets on trac.torproject.org for milestone "Tor: 0.2.4.x-final" in state "needs_review" and look to see whether you think any of them have code that would be amenable to new tests, or to look through currently untested functions and try to figure out how to make more of them tested and testable.

If I were you then I'd first try to create an end-to-end system/integration test via localhost that works via make test. This might involve refactoring the production code or even re-arranging source bases etc. The test script would build and/or mock all necessary parts, bring up the localhost Tor network, run a variety of end-to-end tests, and shut down the localhost Tor network. Next the makefiles should be doctored so that it is easier to discover the coverage, e.g. something like make test-coverage ? At this point the happy path coverage should be much larger than it is today but still way off the desirable 80% to 100% range. At this point one would consider adding the discipline to cover all new lines. The patch author has the personal choice of using unit and/or system/integration level testing to achieve coverage. And there is also a chance that no extra coverage is necessary because the patch is already coverage in the happy path.

If you like the end-to-end localhost Tor network idea then I would be happy to collaborate on creating such a mechanism as a first step.

HTH, Simon

...

yrs,

Nick _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Nick Mathewson

20 Dec 20 Dec

12:35 a.m.

On Wed, Dec 19, 2012 at 5:45 PM, Simon simonhf@gmail.com wrote:

...

On Wed, Dec 19, 2012 at 1:49 PM, Nick Mathewson nickm@alum.mit.edu wrote:

...
On Wed, Dec 19, 2012 at 2:29 PM, Simon simonhf@gmail.com wrote:

[...]

...

...

Large parts of the codebase have been written in a tightly coupled

style that needs refactoring before it can be tested without a live Tor network at hand.

Much automated (unit) testing is done my mocking data structures used by functions and/or mocking functions used by functions. This is possible even with tight coupling.

What's your favorite C mocking solution for integrating with existing codebases without much disruption?

FWIW, I'd be interested in starting to try some of what you're describing about mandatory coverage in the 0.2.5 release series, for which the merge window should open in Feb/March.

[...]

...

...
If you like and you have time, it would be cool to stop by the tickets on trac.torproject.org for milestone "Tor: 0.2.4.x-final" in state "needs_review" and look to see whether you think any of them have code that would be amenable to new tests, or to look through currently untested functions and try to figure out how to make more of them tested and testable.

If I were you then I'd first try to create an end-to-end system/integration test via localhost that works via make test. This might involve refactoring the production code or even re-arranging source bases etc. The test script would build and/or mock all necessary parts, bring up the localhost Tor network, run a variety of end-to-end tests, and shut down the localhost Tor network.

We're a part of the way there, then. Like I said, we've got multiple network mocking/simulation tools. With a simple Chutney network plus the unit tests, we're at ~ 53% coverage... and all Chutney is doing there is setting up a 10-node network and letting it all bootstrap, without actually doing any end-to-end tests.

(ExperimenTor and Shadow are both heavier-weight alternatives for running bigger networks, but I think that here they might not be needed, since their focus seems to be on performance measurement. Chutney is enough for basic integration testing, and has the advantage that it's running unmodified Tor binaries. Stem is interesting here too, since it exercises Tor's control port protocol pretty heavily.)

I've uploaded the gcov output for running the unit tests, then running chutney with the networks/basic configuration, at http://www.wangafu.net/~nickm/volatile/gcov-20121219.tar.xz . (Warning, evil archive file! It will dump all the gcov files in your cwd.)

The 5 most covered modules (by LOC exercised) are: dirvote.c.gcov 553 1222 68.85 config.c.gcov 1429 1229 46.24 util.c.gcov 470 1352 74.20 routerparse.c.gcov 932 1436 60.64 routerlist.c.gcov 858 1509 63.75

The 5 most uncovered modules (by LOC not exercised) are: routerparse.c.gcov 932 1436 60.64 connection_edge.c.gcov 972 384 28.32 rendservice.c.gcov 1249 202 13.92 config.c.gcov 1429 1229 46.24 control.c.gcov 2076 201 8.83

The 5 most uncovered nontrivial modules (by % not exercised) are: dnsserv.c.gcov 148 0 0.00 procmon.c.gcov 48 0 0.00 rendmid.c.gcov 135 0 0.00 status.c.gcov 50 0 0.00 rendclient.c.gcov 506 26 4.89

...

Next the makefiles should be doctored so that it is easier to discover the coverage, e.g. something like make test-coverage ? At this point the happy path coverage should be much larger than it is today but still way off the desirable 80% to 100% range. At this point one would consider adding the discipline to cover all new lines. The patch author has the personal choice of using unit and/or system/integration level testing to achieve coverage. And there is also a chance that no extra coverage is necessary because the patch is already coverage in the happy path.

If you like the end-to-end localhost Tor network idea then I would be happy to collaborate on creating such a mechanism as a first step.

Yes, I like this idea a lot, especially if you're able to help with it, especially if it's based on an already-existing launch-a-network-on-localhost tool. I'm going to be travelling a lot for the rest of December, but let's set up a time to chat in the new year about how to get started.

Preemptive Happy New Year, -- Nick

Simon

1:57 a.m.

On Wed, Dec 19, 2012 at 4:35 PM, Nick Mathewson nickm@alum.mit.edu wrote:

...

On Wed, Dec 19, 2012 at 5:45 PM, Simon simonhf@gmail.com wrote:

...
On Wed, Dec 19, 2012 at 1:49 PM, Nick Mathewson nickm@alum.mit.edu wrote:

...
On Wed, Dec 19, 2012 at 2:29 PM, Simon simonhf@gmail.com wrote:

[...]

...
...

Large parts of the codebase have been written in a tightly coupled

style that needs refactoring before it can be tested without a live Tor network at hand.

Much automated (unit) testing is done my mocking data structures used by functions and/or mocking functions used by functions. This is possible even with tight coupling.

What's your favorite C mocking solution for integrating with existing codebases without much disruption?

This could be worth a separate thread. I'm not aware of really good solutions for C. I have mocked certain system API calls for networking before using e.g #define recvfrom() ... to cause recvfrom() to be called via an indirect pointer. This causes almost no detectable performance penalty in most cases and allows the test author to not only mock but also to tamper with real results e.g. on the third invocation. I.e. the indirect pointer for the #define points to recvfrom() by default but can be changed to point to test_recvfrom() which can optionally call the 'real' recvfrom() and optionally tamper with the results. This technique allows very difficult to simulate network stack conditions to be created with ease. Although the #define mechanism is a chore to initially setup.

I have also thought about experimenting with a different technique for mocking which uses the same technique under the covers but less developer intervention to set it up and maintain it. This could work by using the feature of the C compiler which creates an assembler file from C instead of the usual object file. The assembler file can still be assembled to the object file. The resulting binary is exactly the same except an extra artefact of the build is all the assembler files. Using this mechanism then before compiling the assembler files to object files then the assembler files could be munged, e.g. call my_func in assembler could be changed to call indirect_my_func and another assembler file can be created automatically containing all the indirect pointers. In this way all callable functions could be easily manipulated in unit tests at run-time.

I'd be interested in hearing battle stories about how other people do their mocking. I have heard of the technique of making test functions override production library functions at test link time. But I think this technique isn't as powerful as the above techniques since the original production function isn't available anymore at test run-time.

...

FWIW, I'd be interested in starting to try some of what you're describing about mandatory coverage in the 0.2.5 release series, for which the merge window should open in Feb/March.

[...]

...
...
If you like and you have time, it would be cool to stop by the tickets on trac.torproject.org for milestone "Tor: 0.2.4.x-final" in state "needs_review" and look to see whether you think any of them have code that would be amenable to new tests, or to look through currently untested functions and try to figure out how to make more of them tested and testable.

If I were you then I'd first try to create an end-to-end system/integration test via localhost that works via make test. This might involve refactoring the production code or even re-arranging source bases etc. The test script would build and/or mock all necessary parts, bring up the localhost Tor network, run a variety of end-to-end tests, and shut down the localhost Tor network.

We're a part of the way there, then. Like I said, we've got multiple network mocking/simulation tools. With a simple Chutney network plus the unit tests, we're at ~ 53% coverage... and all Chutney is doing there is setting up a 10-node network and letting it all bootstrap, without actually doing any end-to-end tests.

Sounds good.

I guess Chutney must be a separate project since I can't find it in the Tor sources .tar.gz ?

...

(ExperimenTor and Shadow are both heavier-weight alternatives for running bigger networks, but I think that here they might not be needed, since their focus seems to be on performance measurement. Chutney is enough for basic integration testing, and has the advantage that it's running unmodified Tor binaries. Stem is interesting here too, since it exercises Tor's control port protocol pretty heavily.)

I've uploaded the gcov output for running the unit tests, then running chutney with the networks/basic configuration, at http://www.wangafu.net/~nickm/volatile/gcov-20121219.tar.xz . (Warning, evil archive file! It will dump all the gcov files in your cwd.)

The 5 most covered modules (by LOC exercised) are: dirvote.c.gcov 553 1222 68.85 config.c.gcov 1429 1229 46.24 util.c.gcov 470 1352 74.20 routerparse.c.gcov 932 1436 60.64 routerlist.c.gcov 858 1509 63.75

The 5 most uncovered modules (by LOC not exercised) are: routerparse.c.gcov 932 1436 60.64 connection_edge.c.gcov 972 384 28.32 rendservice.c.gcov 1249 202 13.92 config.c.gcov 1429 1229 46.24 control.c.gcov 2076 201 8.83

The 5 most uncovered nontrivial modules (by % not exercised) are: dnsserv.c.gcov 148 0 0.00 procmon.c.gcov 48 0 0.00 rendmid.c.gcov 135 0 0.00 status.c.gcov 50 0 0.00 rendclient.c.gcov 506 26 4.89

...
Next the makefiles should be doctored so that it is easier to discover the coverage, e.g. something like make test-coverage ? At this point the happy path coverage should be much larger than it is today but still way off the desirable 80% to 100% range. At this point one would consider adding the discipline to cover all new lines. The patch author has the personal choice of using unit and/or system/integration level testing to achieve coverage. And there is also a chance that no extra coverage is necessary because the patch is already coverage in the happy path.

If you like the end-to-end localhost Tor network idea then I would be happy to collaborate on creating such a mechanism as a first step.

Yes, I like this idea a lot, especially if you're able to help with it, especially if it's based on an already-existing launch-a-network-on-localhost tool.

I'm not aware of such a tool. The way I have done it in the past is to use Perl to lunch and monitor the various processes. The good thing about Perl is that it can run unmodified on both *nix and Windows, plus you can do one-liners. And Perl is also heavily tested itself and comes with various testing frameworks, e.g. [1]. Plus Perl I usually installed already on *nix distributions.

An interesting tidbit about localhost is that it's possible for processes to just listen on any IP in the 127.* IP address space without having to first setup an alias at the NIC level. So for example, process 'a' can just start listening on 127.0.100.1:8080 and process 'b' can just start listening on 127.0.200.1:8080. This is useful for example for testing with many connections, e.g. up to IPs * port range TCP connections (not sure if this is relevant for Tor...). Test scripts written in Perl can then test the end-to-end network. For example, by turning up the verbosity of the logging on certain daemons and monitoring that certain events happen. And/or by talking to daemons directly and expecting certain results. Like the existing Tor unit tests then each fulfilled expectation would result in an extra test 'OK' output.

The most important thing is that the testing happens quickly so that developers exercise it all the time. Using make test to start up an end-to-end localhost test with anything from 10 to 100 processes shouldn't be a problem as long as enough RAM is available and the whole thing should take seconds to run all tests.

[1] http://perldoc.perl.org/Test/More.html

...

I'm going to be travelling a lot for the rest of December, but let's set up a time to chat in the new year about how to get started.

Preemptive Happy New Year,

Dito. Sure let's set up a time.

-- Simon

...

-- Nick _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Nick Mathewson

7:25 p.m.

On Wed, Dec 19, 2012 at 8:57 PM, Simon simonhf@gmail.com wrote:

...

On Wed, Dec 19, 2012 at 4:35 PM, Nick Mathewson nickm@alum.mit.edu wrote:

...
What's your favorite C mocking solution for integrating with existing codebases without much disruption?

This could be worth a separate thread. I'm not aware of really good solutions for C.

[....]

I had a look around and found a few more possibilities, including:

https://code.google.com/p/test-dept/ http://throwtheswitch.org/white-papers/cmock-intro.html https://code.google.com/p/cmockery/

None of them looks compellingly great, TBH. The methods that people seem to be using code-rewriting tricks, mandatory macro tricks, LLVM tricks, x86 assembly tricks, and uglier stuff still.

Perhaps somebody else has a good recommendation? It would be sad if we went ant built our own.

[...]

...

...
We're a part of the way there, then. Like I said, we've got multiple network mocking/simulation tools. With a simple Chutney network plus the unit tests, we're at ~ 53% coverage... and all Chutney is doing there is setting up a 10-node network and letting it all bootstrap, without actually doing any end-to-end tests.

Sounds good.

I guess Chutney must be a separate project since I can't find it in the Tor sources .tar.gz ?

Yup. It's accessible from gitweb.torproject.org. I'd be surprised if more than 5 people have tried to run it, ever.

(More results: unittests + chutney gives 52.60% coverage. Unittests + stem gives 39.03% coverage. Unit tests + stem + chutney gives 54.49% coverage.)

...

...
(ExperimenTor and Shadow are both heavier-weight alternatives for running bigger networks, but I think that here they might not be needed, since their focus seems to be on performance measurement. Chutney is enough for basic integration testing, and has the advantage that it's running unmodified Tor binaries. Stem is interesting here too, since it exercises Tor's control port protocol pretty heavily.)

More links: https://shadow.cs.umn.edu/ http://crysp.uwaterloo.ca/software/exptor/

I'm not sure anybody's ever tried to do coverage with them.

[..]

...

...
Yes, I like this idea a lot, especially if you're able to help with it, especially if it's based on an already-existing launch-a-network-on-localhost tool.

I'm not aware of such a tool.

Chutney is such a tool; ExperimenTor can be made (I think) to act as such a tool; Shadow is a little more complicated.

...

The way I have done it in the past is to use Perl to lunch and monitor the various processes. The good thing about Perl is that it can run unmodified on both *nix and Windows, plus you can do one-liners.

[...]

Hm. I'm not going to say that I'd turn down work in perl, but the rest of the Tor developers don't spend much time using perl. I don't know that any of us have done a perl program of over 100 lines in the last 5-6 years. I'm not saying "perl sucks" or "I refuse to use anything written in perl", but you should be aware that if you do write anything in perl, there probably aren't a lot of other people involved with Tor right now with the knowhow to effectively collaborate on the perl parts or help to maintain them.

[....]

...

...
I'm going to be travelling a lot for the rest of December, but let's set up a time to chat in the new year about how to get started.

Preemptive Happy New Year,

Dito. Sure let's set up a time.

[email sent off-list]

yrs,

-- Nick Mathewson

Simon

23 Dec 23 Dec

5:43 a.m.

On Thu, Dec 20, 2012 at 11:25 AM, Nick Mathewson nickm@alum.mit.edu wrote:

...

On Wed, Dec 19, 2012 at 8:57 PM, Simon simonhf@gmail.com wrote:

...
On Wed, Dec 19, 2012 at 4:35 PM, Nick Mathewson nickm@alum.mit.edu wrote:

...
What's your favorite C mocking solution for integrating with existing codebases without much disruption?

This could be worth a separate thread. I'm not aware of really good solutions for C.

[....]

I had a look around and found a few more possibilities, including:

https://code.google.com/p/test-dept/ http://throwtheswitch.org/white-papers/cmock-intro.html https://code.google.com/p/cmockery/

None of them looks compellingly great, TBH. The methods that people seem to be using code-rewriting tricks, mandatory macro tricks, LLVM tricks, x86 assembly tricks, and uglier stuff still.

Perhaps somebody else has a good recommendation? It would be sad if we went ant built our own.

Yep, I've seen those and like you my socks have not been knocked off :-) It is sad but the reality is that there are relatively few teams actively measuring and/or enforcing code coverage. And from those that do then most say it's too difficult to get coverage above the 70% to 90% range... probably because the tools for mocking etc don't exist and/or are too limiting e.g. compiler specific. And then there are even less teams doing high levels of cross platform coverage. I've only ever heard of one open source project that has 100% code coverage with 1000s of tests that run in seconds and happens to be C and cross platform too.

...

[...]

...
...
We're a part of the way there, then. Like I said, we've got multiple network mocking/simulation tools. With a simple Chutney network plus the unit tests, we're at ~ 53% coverage... and all Chutney is doing there is setting up a 10-node network and letting it all bootstrap, without actually doing any end-to-end tests.

Sounds good.

I guess Chutney must be a separate project since I can't find it in the Tor sources .tar.gz ?

Yup. It's accessible from gitweb.torproject.org. I'd be surprised if more than 5 people have tried to run it, ever.

:-)

Why make it a separate project? Why not make it part of make test in the Tor project?

...

(More results: unittests + chutney gives 52.60% coverage. Unittests + stem gives 39.03% coverage. Unit tests + stem + chutney gives 54.49% coverage.)

...
...
(ExperimenTor and Shadow are both heavier-weight alternatives for running bigger networks, but I think that here they might not be needed, since their focus seems to be on performance measurement. Chutney is enough for basic integration testing, and has the advantage that it's running unmodified Tor binaries. Stem is interesting here too, since it exercises Tor's control port protocol pretty heavily.)

More links: https://shadow.cs.umn.edu/ http://crysp.uwaterloo.ca/software/exptor/

I'm not sure anybody's ever tried to do coverage with them.

[..]

...
...
Yes, I like this idea a lot, especially if you're able to help with it, especially if it's based on an already-existing launch-a-network-on-localhost tool.

I'm not aware of such a tool.

Chutney is such a tool; ExperimenTor can be made (I think) to act as such a tool; Shadow is a little more complicated.

...
The way I have done it in the past is to use Perl to lunch and monitor the various processes. The good thing about Perl is that it can run unmodified on both *nix and Windows, plus you can do one-liners.

[...]

Hm. I'm not going to say that I'd turn down work in perl, but the rest of the Tor developers don't spend much time using perl. I don't know that any of us have done a perl program of over 100 lines in the last 5-6 years. I'm not saying "perl sucks" or "I refuse to use anything written in perl", but you should be aware that if you do write anything in perl, there probably aren't a lot of other people involved with Tor right now with the knowhow to effectively collaborate on the perl parts or help to maintain them.

We could just write everything in C?

-- Simon

Damian Johnson

20 Dec 20 Dec

2:31 a.m.

...

[...]

...
Maybe there is no automated testing for any Tor projects? At least a quick search on the wiki only found [1] which lists possible ways to test (but was created 7 months ago and apparently not updated since and collecting dust) and [2] discussing a manual test procedure for TBB. However, tor-0.2.3.25.tar.gz does reveal some test files but the source code ratio of production code to test code is not inspiring at first glance:

[...]

Be aware that we've also been using 'chutney' and 'experimentor' for integration testing. They supplement coverage a bit, though they need more tests, and each tends to hide certain classes of error.

I hate being forgotten in testing discussions. Stem provides rather extensive integration testing targeted at Tor's controller interface.

https://stem.torproject.org/

To run it simply...

git clone https://git.torproject.org/stem.git cd stem ./run_tests.py --integ

Help expanding the tests even further would be greatly appreciated. -Damian

Simon

6:15 p.m.

On Wed, Dec 19, 2012 at 6:31 PM, Damian Johnson atagar@torproject.org wrote:

...

...
[...]

...
Maybe there is no automated testing for any Tor projects? At least a quick search on the wiki only found [1] which lists possible ways to test (but was created 7 months ago and apparently not updated since and collecting dust) and [2] discussing a manual test procedure for TBB. However, tor-0.2.3.25.tar.gz does reveal some test files but the source code ratio of production code to test code is not inspiring at first glance:

[...]

Be aware that we've also been using 'chutney' and 'experimentor' for integration testing. They supplement coverage a bit, though they need more tests, and each tends to hide certain classes of error.

I hate being forgotten in testing discussions. Stem provides rather extensive integration testing targeted at Tor's controller interface.

https://stem.torproject.org/

To run it simply...

git clone https://git.torproject.org/stem.git cd stem ./run_tests.py --integ

Help expanding the tests even further would be greatly appreciated. -Damian

So I have a built and make tested tor-0.2.3.25/ folder and stem/ in a parallel folder. When I run the command then I get the following:

$ ./run_tests.py --integ Unable to start tor, 'tor' does not exists.

How can I run the tests on the Tor built in the tor-0.2.3.25/ folder?

Thanks, Simon

Damian Johnson

6:50 p.m.

...

So I have a built and make tested tor-0.2.3.25/ folder and stem/ in a parallel folder. When I run the command then I get the following:

$ ./run_tests.py --integ Unable to start tor, 'tor' does not exists.

How can I run the tests on the Tor built in the tor-0.2.3.25/ folder?

Just run './run_tests.py --integ --tor /path/to/tor'. Without the '--tor' argument it uses the tor instance in your path (and you evidently don't have one).

Another argument that you might be interested in is '--target RUN_ALL' which will exercise all of the connection and authentication mechanisms. This'll take quite a while to run but provides the best test coverage. You can provide '--help' to see the other options.

Stem's integration tests were designed to test stem. Coincidentally they provide basic verification testing of Tor, though I'd love to see them be expanded to more dedicatedly test Tor! -Damian

Simon

7:13 p.m.

On Thu, Dec 20, 2012 at 10:50 AM, Damian Johnson atagar@torproject.org wrote:

...

...
So I have a built and make tested tor-0.2.3.25/ folder and stem/ in a parallel folder. When I run the command then I get the following:

$ ./run_tests.py --integ Unable to start tor, 'tor' does not exists.

How can I run the tests on the Tor built in the tor-0.2.3.25/ folder?

Just run './run_tests.py --integ --tor /path/to/tor'. Without the '--tor' argument it uses the tor instance in your path (and you evidently don't have one).

Thanks. That gets me a bit further but it fails as follows:

$ ./run_tests.py --integ --tor ../tor-0.2.3.25 ====================================================================== INITIALISING ======================================================================

Performing startup activities... checking for orphaned .pyc files... done

====================================================================== INTEGRATION TESTS ======================================================================

Setting up a test instance... making test directory (/home/simon/work/20121219-tor/stem/test/data)... done configuring logger (/home/simon/work/20121219-tor/stem/test/data/log)... done writing torrc (/home/simon/work/20121219-tor/stem/test/data/torrc)... done # configuration for stem integration tests DataDirectory /home/simon/work/20121219-tor/stem/test/data SocksPort 1112 DownloadExtraInfo 1 ControlPort 1111

Starting tor...

failed to start tor: [Errno 13] Permission denied

Shutting down tor... done Shutting down tor... done TESTING PASSED (0.1 seconds)

Is that a bug that it says 'TESTING PASSED' even though it failed to start Tor? Is the permission denied error because I need to be root to run the tests? If so, how to change the tests so that being root isn't necessary?

Thanks, Simon

Damian Johnson

7:48 p.m.

...

Is that a bug that it says 'TESTING PASSED' even though it failed to start Tor? Is the permission denied error because I need to be root to run the tests? If so, how to change the tests so that being root isn't necessary?

Yea, saying 'PASSED' rather than 'FAILED' would be a bug.

No, the tests definitely don't need to be ran as root. Did you clone stem as root? It expects to be able to make a data directory within the test directory.

Simon

8:06 p.m.

On Thu, Dec 20, 2012 at 11:48 AM, Damian Johnson atagar@torproject.org wrote:

...

...
Is that a bug that it says 'TESTING PASSED' even though it failed to start Tor? Is the permission denied error because I need to be root to run the tests? If so, how to change the tests so that being root isn't necessary?

Yea, saying 'PASSED' rather than 'FAILED' would be a bug.

No, the tests definitely don't need to be ran as root. Did you clone stem as root? It expects to be able to make a data directory within the test directory.

It looks like the data directory was created as expected:

$ find test/data/ test/data/ test/data/log test/data/torrc

Unfortunately ./run_tests.py --help isn't telling me about e.g. a verbosity level option to help give more of a clue as to what is going wrong :-(

Any more ideas?

-- Simon

meejah＠meejah.ca

8:31 p.m.

New subject: Testing in Tor

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

You want to point at your tor binary, I think, not just the path (i.e. something like "--tor ../tor-2.3.*/src/or/tor")

If you're liking Python, I'd also love some feedback on txtorcon's API (it has 98% unit-test coverage, but doesn't depend on a running Tor, so doesn't really help your use-case of testing Tor I suppose).

I do have a very, very experimental thing for Chutney that rips out its templating + subprocess stuff and puts txtorcon there instead.

https://txtorcon.readthedocs.org

Cheers, meejah

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.8 http://mailcrypt.sourceforge.net/ iQEcBAEBAgAGBQJQ03WNAAoJEMJgKAMSgGmnGIQIAJmerVsS0xrDnxztf5rs68W+ 7V28sKVtzrCDA7N57bxGeK634J1ob4wsjftWL1tYaGwBkW+rN+h9X0tdDgH/R9vT v0+TD3HdipZA5ICCsPL2+OBgzdXIeDqImJXqh+1K5JlA5A3BYaKsqv+AlM69pNTR hiwjXeMoN4+19cWYOHcDkAHG7E2hRCMMcyqrVCWK6XvzzgICTfoyStFIn4bVMiDi EkK1ff+9nag5EKC3sGpHI1EMyn9WR7dojwRj0s0QeSH4oUEdkwu+0AyDGpmMpy+/ ZP1GchgNFIuU9L70g+mgAATQpFMpTbpCXDqAWNPWK8qycoFre3bZTzWZwRhfFVk= =/jt/ -----END PGP SIGNATURE-----

Simon

9:06 p.m.

New subject: Testing in Tor

On Thu, Dec 20, 2012 at 12:31 PM, meejah@meejah.ca wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

You want to point at your tor binary, I think, not just the path (i.e. something like "--tor ../tor-2.3.*/src/or/tor")

That did the trick, thanks:

$ time ./run_tests.py --integ --tor ../tor-0.2.3.25/src/or/tor ... Shutting down tor... done 21 TESTS WERE SKIPPED ALL OTHER TESTS PASSED (34 seconds) real 0m34.456s

Why do the tests take so long to run? I noticed that most of the time almost no CPU is used and hardly any network is used. Could the individual tests be somehow run in parallel to speed things up? I noticed a few time.sleep() calls in the .py files :-(

-- Simon

Damian Johnson

10:43 p.m.

New subject: Testing in Tor

...

...
You want to point at your tor binary, I think, not just the path (i.e. something like "--tor ../tor-2.3.*/src/or/tor")

That did the trick, thanks:

Oops, I'm sorry about not being clearer about that.

...

Why do the tests take so long to run? I noticed that most of the time almost no CPU is used and hardly any network is used.

You consider 34 seconds a long time? Heh, the test suite we have at my work takes on the order of twenty minutes to run...

You can see the individual test runtimes to get an idea of where the time's going. The longest tests are things that parse the entire consensus. The sleep() calls you mentioned account for precious little (in total a few seconds) which is mostly to test things like "Tor emits a BW event every second". Patches welcome.

...

Could the individual tests be somehow run in parallel to speed things up?

See "Run integration targets in parallel" on...

https://trac.torproject.org/projects/tor/wiki/doc/stem

Cheers! -Damian

Damian Johnson

21 Dec 21 Dec

4:10 p.m.

New subject: Testing in Tor

Hi Simon, I've made some fixes to hopefully avoid confusion about this in the future (admittedly it was a really bad error message).

Better sanity checking for the tor command... https://gitweb.torproject.org/stem.git/commitdiff/e22a753058a2b9c61301c62e2d...

Reporting testing as having failed if tor can't start... https://gitweb.torproject.org/stem.git/commitdiff/bf2f7f2a461b535ddd21728e29...

Thanks for the catches! -Damian

Simon

23 Dec 23 Dec

1:16 a.m.

New subject: Testing in Tor

On Thu, Dec 20, 2012 at 2:43 PM, Damian Johnson atagar@torproject.org wrote:

...

...
...
You want to point at your tor binary, I think, not just the path (i.e. something like "--tor ../tor-2.3.*/src/or/tor")

That did the trick, thanks:

Oops, I'm sorry about not being clearer about that.

No problem.

...

...
Why do the tests take so long to run? I noticed that most of the time almost no CPU is used and hardly any network is used.

You consider 34 seconds a long time? Heh, the test suite we have at my work takes on the order of twenty minutes to run...

Yes :-) I've seen projects which have tests which take nearly 10 hours to run. However, the longer the tests take to run then the less likely that developers will run them. IMO all tests should ideally take no more than 1 to 2 minutes to run. So 34 seconds is pretty good except that ideally Tor needs to have about 100 times as many tests to get code coverage and quality (of Tor itself) up to the 90% plus range. So with this few tests taking 34 seconds then 100 times more tests would take in the many minutes / hours range. I'm thinking that many thousands of tests should take no longer than 1 to 2 minutes to run.

...

You can see the individual test runtimes to get an idea of where the time's going. The longest tests are things that parse the entire consensus. The sleep() calls you mentioned account for precious little (in total a few seconds) which is mostly to test things like "Tor emits a BW event every second". Patches welcome.

It would be great if the tests themselves reported their own times. And also had a common output format to the standard Tor make test results. When I run the tests then it's easy to see which ones take longer because there are large pauses as text scrolls up the screen. However, during those pauses then I'm seeing almost no CPU, network, or disk activity which leads me to believe that some tests are not written as well as they could be.

...

...
Could the individual tests be somehow run in parallel to speed things up?

See "Run integration targets in parallel" on...

https://trac.torproject.org/projects/tor/wiki/doc/stem

Thanks. So that's a feature on the todo list :-) It looks like the tests are starting up daemons using fixed ports which stops other tests from running in parallel. In the past I have solved this problem by getting common test code to start a particular daemon listening on port zero which makes the OS choose a non-used port for listening on. When doing this then the common test code needs to somehow discover which port the daemon ends up listening on when it is started. A common way to do this is to get the daemon to output the port to its log file. In this way the common test code not only discovers which unique port the daemon is listening on, but also for daemons which take a little time to start up, then the log output with the listening port may also signify when the daemon is ready for work. In this way many tests can run in parallel without having to worry about port collision. However, the production code for the daemon being tested may have to be changed in order to be able to listen on port zero and/or report the port that it actually ends up listening on.

So what's the difference between Stem tests and 'Chutney'? AFAIK Chutney is a bunch of WIP Python scripts to setup and execute end-to-end Tor tests. Are the Stem tests not also doing something very similar? Why are neither set of tests included in the Tor repo so that they can be run using make test?

-- Simon

...

Cheers! -Damian _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Damian Johnson

2:42 a.m.

New subject: Testing in Tor

...

Yes :-) I've seen projects which have tests which take nearly 10 hours to run. However, the longer the tests take to run then the less likely that developers will run them.

Agreed. I run stem's unit tests more often than the integ tests since those have the runtime of around five seconds. For the integ tests I usually supply the '--test' argument so it only runs the test that I'm interested in.

...

... except that ideally Tor needs to have about 100 times as many tests to get code coverage and quality (of Tor itself) up to the 90% plus range. So with this few tests taking 34 seconds then 100 times more tests would take in the many minutes / hours range.

Not necessarily. It mostly depends on what the tests do - there's a few tests that take around five seconds each and a whole lot of other ones that take a few milliseconds. We could greatly expand stem's test coverage of tor without impacting the runtime much, and could probably lower the runtime a fair bit if we put some more effort into it.

...

It would be great if the tests themselves reported their own times.

Feel free to add an option for this, it would be reasonably easy to do. Personally I find that it's enough to know the test module that's taking a while, but I could see per-test runtimes being helpful.

...

And also had a common output format to the standard Tor make test results.

I haven't looked at tor's testing output. Is it better? I invested quite a bit of time in making stem's test output nice and easily readable.

...

However, during those pauses then I'm seeing almost no CPU, network, or disk activity which leads me to believe that some tests are not written as well as they could be.

I just ran the integ tests and it pegged the cpu of my poor little netbook (it also took 63 seconds - it would be nice if it only took 34 seconds like your system...). I'm not sure why it isn't showing significant resource use on your system.

...

It looks like the tests are starting up daemons using fixed ports which stops other tests from running in parallel.

This shouldn't be an issue. Multiple controller can bind to the control port.

...

So what's the difference between Stem tests and 'Chutney'?

Stem is a controller library with integration tests to check its interaction with a live tor instance. Its tests focus on the behavior of tor's control interface.

Chutney however is a framework specifically for testing how multiple tor instances interact. It's under very light development by comparison to stem...

https://gitweb.torproject.org/nickm/chutney.git/shortlog https://gitweb.torproject.org/stem.git/shortlog

...

Why are neither set of tests included in the Tor repo so that they can be run using make test?

Because they're both separate applications from the core tor executable. Mixing the projects (and their git histories) would be confusing. I would like to see stem be more actively used in core tor development for testing though. In an ideal world new tor controller features would include a corresponding test in stem...

Sean Robinson

8:41 p.m.

New subject: Testing in Tor

On Sat, Dec 22, 2012 at 7:42 PM, Damian Johnson atagar@torproject.orgwrote:

...

On Sat, Dec 22, 2012 at 6:16 PM, Simon simonhf@gmail.com wrote:

...
So what's the difference between Stem tests and 'Chutney'?

Stem is a controller library with integration tests to check its interaction with a live tor instance. Its tests focus on the behavior of tor's control interface.

Chutney however is a framework specifically for testing how multiple tor instances interact. It's under very light development by comparison to stem...

https://gitweb.torproject.org/nickm/chutney.git/shortlog https://gitweb.torproject.org/stem.git/shortlog

...
Why are neither set of tests included in the Tor repo so that they can be run using make test?

Because they're both separate applications from the core tor executable. Mixing the projects (and their git histories) would be confusing. I would like to see stem be more actively used in core tor development for testing though. In an ideal world new tor controller features would include a corresponding test in stem...

Simon,

I don't know about Chutney, but Stem is a separate effort from Tor. Stem is related to Tor, is hosted by the project, and there is much communication between the two, but running the Stem tests to learn about Tor code would not be helpful.

As Damian says, Stem could be used to build tests exercising the controller side of Tor code. However, those would be Tor tests, not Stem tests.

-- Sean Robinson

Damian Johnson

9:49 p.m.

New subject: Testing in Tor

...

... but running the Stem tests to learn about Tor code would not be helpful.

I have mixed feelings on this. Stem's tests are chiefly to test Stem, but the integration tests are for checking how our library interacts with a live Tor instance. There's a lot of value to Stem for us to prevent bugs and regression in the Tor controller interface - it helps Stem's users and prevents us from needing to implement gross hacks (see for instance the connection.py code for handling #4881).

I'd love for Tor developers to expand Stem's integ tests to better test's Tor's controller interface. I'm not, however, interested in having Stem test Tor's other functionality (relaying, hidden service, etc). I very much agree that down that road lies terrible scope creep.

Cheers! -Damian

Damian Johnson

20 Dec 20 Dec

8:58 p.m.

...

It looks like the data directory was created as expected:

$ find test/data/ test/data/ test/data/log test/data/torrc

Unfortunately ./run_tests.py --help isn't telling me about e.g. a verbosity level option to help give more of a clue as to what is going wrong :-(

Any more ideas?

What happens when you run '/path/to/tor -f /path/to/stem/test/data/torrc'? That's all that stem is doing to start tor. If it's failing like this then running tor itself should also be failing.

4337

Age (days ago)

4341

Last active (days ago)

tor-dev@lists.torproject.org

20 comments

5 participants

tags (0)

participants (5)

Damian Johnson
meejah＠meejah.ca
Nick Mathewson
Sean Robinson
Simon