The primary
information being measured is the directionality of the first 5k cells sent on a
measurement circuit, and a keyed-HMAC of the first domain name requested on the
circuit.
I suppose this is kind of a non-question, since you wouldn't be doing it otherwise, but I am surprised that associating the traffic patterns to a single key, that of the first domain name, is sufficient. Every page or query made to that domain (e.g. duckduckgo) will have the same key, with potentially a lot of entirely disparate traffic patterns.
Obviously this is limited by what you can technically achieve in this scenario: you have the plaintext DNS requests, and everything else is going to be TLS-encrypted. The alternative would be to instrument a tor client/browser and find volunteers to opt-in to their data collection.
-tom