On Tue, Sep 10, 2024 at 6:26 PM Jim Newsome jnewsome@torproject.org wrote:
It'd be helpful to have more context about the object IDs and what we're trying to accomplish with them here; why we need/want them in arti but didn't in c-tor. I'm inferring (maybe incorrectly) that the idea is that this is effectively letting us multiplex differently-configured SOCKS->Tor services on a single port. And/or maybe to multiplex multiple data connections over a single SOCKS socket? Is it worth doing these vs the alternatives (a listening port per service/object and a socket per data stream)? e.g. is this fixing some current resource exhaustion issue, or one we expect to be more problematic in arti...?
Maybe worth mentioning the length limit for user and password (255 I believe) and that it'll be sufficient (?)
Otherwise LGTM
This is a good question! Right now there isn't a complete spec for arti RPC, but for background you could have a look at the file `rpc-meta-draft.md` ( https://gitlab.torproject.org/tpo/core/arti/-/blob/main/doc/dev/notes/rpc-me... ) in arti, as amended by the WIP branch at https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/2386 .
This doesn't (yet) describe the DataStream protocol, since that's what we're trying to hammer out here. There's a comment in arti::socks about that which I hope to migrate to rpc-meta-draft once it is accurate. I'll copy out the relevant parts below, since many of them are about to be overwritten by this proposal.
Sorry about so many incomplete documents! I hope that this will help answer the questions. If not, please just poke me again.
/// ## Key concepts /// /// A data stream is "RPC-visible" if, when it is created via SOCKS, /// the RPC system is told about it. /// /// Every RPC-visible stream is associated with a given RPC object when it is created. /// (Since the RPC object is being specified in the SOCKS protocol, /// it must be one with an externally visible Object ID. /// Such Object IDs are cryptographically unguessable and unforgeable, /// and are qualified with a unique identifier for their associated RPC session.) /// Call this RPC Object the "target" object for now. /// This target RPC object must implement /// the [`ConnectWithPrefs`](arti_client::rpc::ConnectWithPrefs) special method. /// /// Right now, there are two general kinds of objects that implement this method: /// client-like objects, and stream-like objects. /// /// A client-like object is either a `TorClient` or an RPC `Session`. /// It knows about and it is capable of opening multiple data streams. /// Using it as the target object for a SOCKS connection tells Arti /// that the resulting data stream (if any) /// should be built by it, and associated with its RPC session. /// /// An application gets a TorClient by asking the session for one, /// or for asking a TorClient to give you a new variant clone of itself. /// /// A stream-like object is an `arti_rpcserver::stream::RpcDataStream`. /// It is created from a client-like object, but represents a single data stream. /// When created, it it not yet connected or trying to connect to anywhere: /// the act of using it as the target Object for a SOCKS connection causes /// it to begin connecting. /// (You can also think of this as a single-use client, /// which once used, becomes interchangeable with the DataStream it created.) /// (TODO: We may wish to change this vocabulary. /// We may wish to call this a "stream handle", for instance?) /// /// An application gets an RpcDataStream by calling `arti:new_stream_handle /// on any client-like object. Currently, this always creates an RpcDataStream /// that makes optimistic connections; See #1583. ... /// ## Intended use cases (examples) /// /// (These examples assume that the application /// already knows the SOCKS port it should use. /// I'm leaving out the isolation strings as orthogonal.) /// /// These are **NOT** the only possible use cases; /// they're just the two that help understand this system best (I hope). /// /// ### Case 1: Using a client-like object directly. /// /// Here the application has authenticated to RPC /// and gotten the session ID `SESSION-1`. /// (In reality, this would be a longer ID, and full of crypto). /// /// The application wants to open a new stream to www.example.com. /// They don't particularly care about isolation, /// but they do want their stream to use their RPC session. /// They don't want an Object ID for the stream. /// /// To do this, they make a SOCKS connection to arti, /// with target address www.example.com. /// They set the username to `<arti-rpc-session>`, /// and the password to `SESSION-1`. /// /// Arti looks up the Session object via the `SESSION-1` object ID /// and tells it (via the ConnectWithPrefs special method) /// to connect to www.example.com. /// The session creates a new DataStream using its internal TorClient, /// but does not register the stream with an RPC Object ID. /// Arti proxies the application's SOCKS connection through this DataStream. /// /// /// ### Case 2: Creating an identifiable stream. /// /// Here the application wants to be able to refer to its DataStream /// after the stream is created. /// As before, we assume that it's on an RPC session /// where the Session ID is `SESSION-1`. /// /// The application sends an RPC request of the form: /// `{"id": 123, "obj": "SESSION-1", "method": "arti:new_stream_handle", "params": {}}` /// /// It receives a reply like: /// `{"id": 123, "result": {"id": "STREAM-1"} }` /// /// (In reality, `STREAM-1` would also be longer and full of crypto.) /// /// Now the application has an object called `STREAM-1` that is not yet a connected /// stream, but which may become one. /// /// The application opens a socks connection as before. /// For the username it sends `<arti-rpc-session>`, /// and for the password it sends `STREAM-1`. /// /// Now Arti looks up the `RpcDataStream` object via `STREAM-1`, /// and tells it (via the ConnectWithPrefs special method) /// to connect to www.example.com. /// This causes the `RpcDataStream` internally to create a new `DataStream`, /// and to store that `DataStream` in itself. /// The `RpcDataStream` with Object ID `STREAM-1` /// is now an alias for the newly created `DataStream`. /// Arti proxies the application's SOCKS connection through that `DataStream`. ///