New subject: Is it possible to leak huge load of data over onions?

25 Mar 2016


      (Hi!  Here's a document I've been poking at , on and off, for a while.
There's a lot more to say here, but I think it's ready to share with
tor-dev for comment. Thanks!)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ 0. PRELIMINARIES.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
============== WHY?
If you've ever looked at even slightly serious software engineering
methodology, you've seen some discussion of modularity: the idea that
programs should be divided into a number of smaller modules; that
complexity should be hidden within modules and not in the interfaces
between them; that which modules can call which other modules should be
limited; and other little rules like that.
We'd like modules isolated from one another for these reasons:
* Make them simpler to test
    * Minimize what is allowed to touch what
    * Make tor easier to understand
    * Make tor easier to hack on
    * Let us write parts of tor in other languages.
    * Make sandboxing more powerful
This is actually several problems all in one!
- OO/Isolation/Modularity design that doesn't actually split into multiple
    processes, but makes the callgraph much much simpler.
- Multiprocess sandbox framework with RPC implementing the above abstraction
- Finding the right stuff to carve out into separate sandboxes
- Finding the right stuff to carve into separate modules.
============ Steps/phases:
Over the past 6-8 months we've gone a lot of work to simplify Tor's
callgraph in preparation for this work.  We've got our callgraph
simplified to a pretty extreme degree, for example.  Here are some
of our next steps, some of which can be done in parallel.
1. Figure out which modules should be able to call each others.  Isolate
   them based on their ability to do so.  Refactor until they actually work
   that way.
   - In some cases we will need to split
   - hide big piles of modules behind single entry points.
   - separate src/common and src/or into separate parts.
   - Add tools to enforce this separation
   - Add tools to separate Tor functions safely and not too error-prone.
(See sections 1 and 3 below.)
2. Add backend abstractions as needed to minimize module coupling. These
   should be abstractions that are friendly to in- and multi-process
   implementations.  We will need at least:
- publish/subscribe{,/acknowledge}.
(See
        https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern.
     The 'acknowledge' abstraction
     allows the publisher to wait for every subscriber to acknowledge
     receipt.  More on this in section 4 below.)
- handles
(We need frequently need handles to objects where the handles need
     to be persistent while the objects themselves can go away. We use
     several patterns for this now. We should just have a handle
     pattern for this instead. This would need separate implementations for
     in-process and cross-process access. See section 2 below for more
     information.)
- Need a way to mark functions as "internal to module".
We're going to have a larger number of multi-file modules.  Right now,
     our only restriction for making functions less visible is "static", and
     the fact that we don't allow src/common to call src/or.  Both of these
     should be cleaned up.
3. Write abstractions for splitting code into multiple processes that
   talk over IPC.
(See section 5 below)
=========================  Metrics
We can measure our success in a number of ways:
* How large are the strongly connected components among modules
    calling another? Right now the largest SCC is 52 C files.
* How many layer/module violations can we find?  This needs to
    be an automated process.
* How many exported functions from each module can we remove?
* How much can we remove from the main linux sandbox code?  How
    many functions can we isolate to purpose-built modules?
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ 0. THE MODULES
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
=========================  Modules in src/or
LOWEST LEVEL
These should only get called from higher-level components.  They shouldn't
  call anything of a higher level.
Types and data structures.
    - part of configuration.
    - parts of statefile
    - routerset
    - policy (some)
    - torcert
Various backends and utilities.
    - confparse
    - geoip
    - fp_pair?
    - reasons
    - replaycache
    - buffers
Nodelist data structures, excluding routerinfo.
    - nodelist
    - microdesc
    - networkstatus
    - routerparse.c (parts)
SLIGHTLY HIGHER
These should only get called from higher-level components, and only
  call lower-level components.  When they currently call a higher-level
  piece, they should use a callback or a publish/subscribe API or
  something.
Nothing in here should really be tor-specific.
mainloop:
    - connections (common, genericparts)
    - main (nonspecific parts)
    - config (maybe)
    - workqueue
    - scheduler
    - periodic
    - cpuworker (parts)
    - hibernate ?  (Parts?)
HIGHER THAN THAT
These modules are where we start making Tor do Tor.  They should only
  call lower-level modules
onionrouting:
    - channels
    - onion*
    - cpuworker (parts)
    - connection_or
    - connection_edge
    - circuitlist
    - command
    - relay
    - parts of router.c
    - transports
HIGHER STILL
These modules can call downwards, though they should really be made as
  independent as possible from each other and from one another.
We should enumerate which of these can call which others; mostly, they
  should leave one another alone.
hidden services:
    - rend*
controller:
    - control.c
auxiliary:
    - rephist
authority:
    - dirvote
    - dirserver (parts)
    - dirauth
    - dircollate
    - keypin
relay:
    - ext_orport
    - parts of router.c
    - routerkeys.c
cache:
    - parts of directory
    - dirserver (parts)
    - routerlist (parts)
    - routerparse.c (parts)
exit:
    - dns
client:
    - dnsserv
    - most circuitbuild, circuituse stuff
    - circpathbias
    - circuitstats
directory:
    - directory
    - dirserver
    -
HIGHEST LEVEL
Here's the parts of the application that know all the other parts, and call
  down.  Nothing else calls up to these modules.
application (highest level)
    - Main (parts)
    - parts of config
    - parts of statefile
    - ntmain
    - status
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ 2. INFRASTRUCTURE: HANDLES.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
========================  Handles in C
* We need a light-weight handle mechanism.  (A mechanism where
    we can have a reference to X that can still live even when X dies,
    where lookups of X through the reference are safe even when X is
    freed, and where lookups of X through the reference are FAST.)
* It's okay if the things being handled need to be marked as 'handleable'.
* When the handle is in a separate process, we probably need a
    table-based implementation:
    * Unique handle (64/96 bit int) for each thingie; hashtable to look
       up by handle.
* When the reference is local, can get as fast as pointer lookup.
* Threadsafety can be important but is not always required.
For an example implementation, see #18362.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ 3. INFRASTRUCTURE: ENFORCING ISOLATION.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
=== Making isolation happen.
It'll do us no good to declare that modules are isolated if they
actually aren't.  Once we have defined the allowed call patterns, we
need to make sure that calls that fall outside of those patterns
aren't permitted.
We can build something pretty simple out of 'nm', I believe.
See ticket #18617.  This is a work in progress.
=== Moving functions around.
I'm working on a tool to move functions from one file to another in
a deterministic way based on annotations in the original file.  This
lets us version our movement, and have the movement happen
deterministically so that the patches are easier to verify.  If we
integrate it with our callgraph-examination tools above, we can make
sure that our movement plans actually make the code more conformant
with our modularity plans.
See ticket #18618.  This is a work in progress.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ 4. INFRASTRUCTURE: PUBLISH/SUBSCRIBE
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Many of our modularity violations happen because when some property
changes, or some event occurs, we need to inform a wide variety of other
modules.  The publish/subscribe{,/acknowledge} pattern, and the similar
observer pattern, were made for this.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ 5. ADVANCED INFRASTRUCTURE: MULTIPROCESS SANDBOXING
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
========================  Multiprocess abstraction / design
We'll consider multiprocess for these cases:
* Easier/stronger/more-portable sandboxing.
  * Writing subcomponents in languages whose runtime/link rules
     don't play so nicely with C.
  * Enforcing modularity when modularity delivers security.
For sandboxing, here are some high-risk privileges we should
make sure that other code doesn't have.
* Filesystem calls -- anything that can open a file.
  * Exec -- anything that can launch a process.
  * Access to identity or signing keys.
  * Invoking getaddrinfo() [because it uses a fairly huge segment
    of the underlying operating system]
And here is some higher-risk code that we could most safely
isolate from the main modules.
* Anything that parses an object.
  * Consensus diff processing.
Our basic abstractions are those described above (pub/sub, handles,
  hooks), with the addition of:
* Blocking RPC
  * Capabilities????
  * message queues.
Controversial:
  * We should NOT require that most of this be blindingly fast. Correct
    is perfectly adequate.  Fast is only needed in the critical path.
Here's the general architecture I have in mind:
  * At the highest level, have a launcher/supervisor process whose
    only job is to start the other processes, monitor them, and
    notice if they go down.
* Additionally, have a communications-hub process whose role
    is to pass messages between other processes, performing
    access control on the messages to make sure that each message
    is only sent by a process that is allowed to send it,
    to a process that it is allowed to send to.  Processes may only
    communicate with the hub.
* Messages should use a simple RPC format.  I vote for protobufs or
    capnproto.  They sure have a speed/simplicity tradeoff though.
* pipes or socketpairs should get used on Unix.  Windows can use
    windows Pipes, which aren't quite FDs.  Windows will need to have
    its pipes in a different thread from the socket-based libevent loop
    (if there is one.)
* We need a way to pass fds or handles back and forth. Both Windows
    and Unix can do this. (via DuplicateHandle() on Windows and
    sendmsg() on PF_UNIX sockets on unix.)  The protocol
    needs to handle this explicitly.
Here's the development plan I would suggest:
  * In parallel, work on improving Tor's modularity as the earlier
    sections of this document suggest.  Unless we get more modular, we
    won't be able to make anything actually separate besides the
    privilege-restriction sandboxing parts above (like isolating keys
    and FS access).
* Research what exactly windows and osx allow.
* Write the top-level supervisor process and hub process.
* Ensure that the APIs we expose can be written in C in a fairly
      mechanism-agnostic way, so that we can migrate to this
      architecture with minimal additional effort.
* Move privilege-restriction sandboxing parts into subprocesses.
And here is an alternative plan:
  * Investigate the chromium sandbox code; see if it can be extracted,
    or how much of it can be extracted.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ A. APPENDIX.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
=== progress so far
I've gotten these done so far:
* The starts of a build-time callgraph enforcement mechanism
        using nm and readelf.
* A handle-based-reference module. (See ticket #18362, but
        cpunks doesn't like it.)
* A draft refactoring tool that lets you annotate code with
        what should move where, so that you can deterministicly try
        the code movement and see what happens, and so you can see
        what effects it will have on the module-level callgraph.

A few ideas about improved design/modularity in Tor