Hi Philipp,
First, thank you for the input. I will certainly review your discussions with other measurement team members. I'm sorry I wasn't able to attend.
On the subject of databases and why they're a kludge. Databases represent relationships between data as joins. Joins are a construct which must be maintained by the database which must persist or be enforced by integrity constraints. A database may be useful to store data in it's final form, and to represent relationships between such entities. It requires computation in an interpreted language and joins are not represented using formal math. (In a matter of speaking database theory does encompass some abstract math objects in the form of sets). Storing data and representing known relationships is what a database is designed for. Analyzing data and finding dynamic relationships is something a database will never do well--it's outside the intended use. Formal (mathematically) methods for representing semantics can always be proved correct using rigorous methods, and will always be faster. Imagine if tor's path selection algorithm were implemented as a database. It would work but the math-derived implementation will also be vastly superior.
Allow me to clarify further. The formal language described here is used to derive subset languages. In a matter of speaking the base language is a representation of tor's network communication. By adding additional grammar to this language a researcher can define formally the semantic relationships that hold particular interest or meaning. One researcher, who is only interested in onionoo-like applications (which is me in this case, not Karsten) would create a grammar describing such content. Another who is interested in a particular class of analysis might have another grammar. Right now my objective in the forks is to make this possible (it's not currently).
The advantage is it's easy to maintain for researchers, easy to maintain for developers, easy to create proofs on the system, easy to implement formal validation methods (which you may really want for some important classes of research).
So there's really not a language to learn per-se. It's a formal method of making all that tor-network gibberish make sense. Once you've described the semantic meaning it's *all* automatic. Want that semantic relationship to build a shiny viz in R--automatic. Want those semantics to trigger an email for censorship--automatic. Would you rather have a report and a graph describing nodes involved in a potential attack--automatic. Would you like to create JSON representations of related entity--automatic.
Strangely, in the history of analysis at tor project, no one has tried, and it is not implemented in any reusable/presentable form. I very much doubt a potential-sponsor would be willing to sponsor work on metrics-lib, because it's basically useless for analysis (same as the others I've mentioned). A researcher has to do too much work to perform analysis to see tor project as having contributed to making it easy.
I hope that clears things about having to learn a language. Although that's also possible, the techniques are not being used here to create a programming language. The techniques are being used to perform linguistics on tor data. It's possible however to extend this work to define a language for programming, but that's not the primary objective. (An implementation, such as I describe, would make that possible in a formal way--which is good of course)
Regards --leeroy