Hi Mike,

Just a few questions about the proposal, inline below:

On 12 Sep 2015, at 10:34, Mike Perry <mikeperry@torproject.org> wrote:

Here's a proposal describing some padding negotiation cell primitives that
should be useful to defend against website traffic fingerprinting, hidden
service circuit fingerprinting, and probably just about any other traffic
analysis attack under the sun.

The proposal is in my git remote at:
https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/xxx-padding-negotiation.txt?h=padding_negotiation

==============================

Filename: xxx-padding-negotiation.txt
Title: Padding Negotiation
Authors: Mike Perry
Created: 01 September 2015
Status: Draft


...


3. End-to-end circuit padding

For circuit-level padding, we need two types of additional features: the
ability to schedule additional incoming cells at one or more fixed
points in the future, and the ability to schedule a statistical
distribution of arbitrary padding to overlay on top of non-padding
traffic (aka "Adaptive Padding").

In both cases, these messages will be sent from clients to middle nodes
using the "leaky pipe" property of the 'recognized' field of RELAY
cells, allowing padding to originate from middle nodes on a circuit in a
way that is not detectable from the Guard node.

This same mechanism can also be used to request padding from the Guard
node itself, to achieve link-level padding without the additional
overhead requirements on middle nodes.

3.1. Fixed-schedule padding message (RELAY_COMMAND_PADDING_SCHEDULE)

The fixed schedule padding will be encoded in a
RELAY_COMMAND_PADDING_SCHEDULE cell. It specifies a set of up to 80
fixed time points in the future to send cells.

Do we want to add a length field, for schedules with less than 80 time points?
Or is this unnecessary?

XXX: 80 timers is a lot to allow every client to create. We may want to
have something that checks this structure to ensure it actually
schedules no more than N in practice, until we figure out how to
optimize either libevent or timer scheduling/packet delivery. See also
Section 4.3.

The RELAY_COMMAND_PADDING_SCHEDULE body is specified in Trunnel as
follows:

   struct relay_padding_schedule {
      /* Number of microseconds before sending cells (cumulative) */
      u32 when_send[80];

      /* Number of cells to send at time point sum(when_send[0..i]) */
      u16 num_cells[80];

      /* Adaptivity: If 1, and server-originating cells arrive before the
         next when_send time, then decrement the next non-zero when_send
         index, so we don't send a padding cell then, too */
      u8 adaptive IN [0,1];

Is this the definition of “adaptivity" that we want?

I can see that it works in the scenario where a client wants a minimum of
X cells delivered at-or-before time T (adaptively), where T is a single schedule
arbitrarily far in the future.

But if a client wants a minimum of X cells delivered every T microseconds
(adaptively), then there’s no way to reliably specify that using this scheme. If
10*X cells originate at the server due to regular traffic in the first interval, that will
cause the next 10 intervals to be skipped.

I can see that the histograms work around this issue by refilling with the original
figures, but I’m not sure that works for fixed-point padding.

   };

To allow both high-resolution time values, and the ability to specify
timeout values far in the future, the time values are cumulative. In
other words, sending a cell with when_send = [MAX_INT, MAX_INT, MAX_INT,
0...] and num_cells = [0, 0, 100, 0...] would cause the relay to reply
with 100 cells in 3*MAX_INT microseconds from the receipt of this cell.

3.2. Adaptive Padding message (RELAY_COMMAND_PADDING_ADAPTIVE)

...

3.2.4. Token removal and refill

If the remove_tok field is set to true for a given state's histogram,
then whenever a padding packet is sent, the corresponding histogram
bin's token count is decremented by one.

If a packet matching the current state's transition_reschedule_events
bitmask arrives from the server before the chosen padding timer expires,
then a token is removed from the nearest non-empty bin corresponding to
the delay since the last packet was sent, and the padding packet timer
is re-sampled from the histogram.

Does nearest mean “next highest” or “closest”?


If the entire histogram becomes empty, it is then refilled to the
original values.

...

Thanks

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP: 968F094B (ABFED1AC & A39A9058 expire 15 Sep 2015)

teor at blah dot im
OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F (From 1 Sep 2015)