Hello devs,
I'm seeking advice from people with experience in writing server-side Java applications.
Let me give you some background about this request: for the past five years, I have been developing server-side Java applications which all process large amounts of Tor directory data and provide their output via a web interface.
Examples:
- The metrics data processor (metrics-db) fetches Tor descriptors from the Tor directory authorities, the bridge authority, etc., performs some sanity-checks, and provides descriptors by type as tarballs. We're talking about roughly 7 GiB new bzip2-compressed data per month.
- The metrics website (metrics-web) uses the output from the metrics data processor, stuffs everything into a database, computes aggregates, and presents results in graphs and .csv files.
- The Onionoo service processes the same data from the metrics data processor, but provides statistics per Tor relay, not for the Tor network as a whole. The processing is done every two hours and may take 30 minutes to 1.5 hours, depending on how overloaded the server is.
- The ExoneraTor service, again, uses the same data and puts it in a database to answer whether a certain IP address has been a Tor relay at some point in the past.
That's what is done. And here's how it's done under the surface:
- There's one or more cronjobs, each of which starts an ant task to process data. Some of these tasks import data into the database, others store results in the file system.
- Each application uses a web application deployed in Tomcat to provide results to web users. Most things are written in servlets, some use JSPs.
My problem is that this approach is rather fragile and difficult to setup for new volunteers. I'm aware of that, and I'd like to improve it.
My question is: what Java frameworks should I be looking at for the applications described above? Bonus points if something is in Debian stable.
Note that "switch to $some_other_programming_language" is not a very useful answer to me, at least not for the larger applications. There's just too much existing code and not enough developer time to port it.
Thanks in advance!
All the best, Karsten
Hi Karsten,
A lot of people I respect seem to use Dropwizard for this sort of thing.
https://dropwizard.github.io/dropwizard/
As for deployment on Debian (dunno if it's in the standard Debian universe)
https://groups.google.com/d/msg/dropwizard-user/gv4TDQbcHBc/LGJz0egMNWQJ
Hope that helps Best
Noah
On Tue, May 20, 2014 at 12:54 PM, Karsten Loesing karsten@torproject.orgwrote:
Hello devs,
I'm seeking advice from people with experience in writing server-side Java applications.
Let me give you some background about this request: for the past five years, I have been developing server-side Java applications which all process large amounts of Tor directory data and provide their output via a web interface.
Examples:
- The metrics data processor (metrics-db) fetches Tor descriptors from
the Tor directory authorities, the bridge authority, etc., performs some sanity-checks, and provides descriptors by type as tarballs. We're talking about roughly 7 GiB new bzip2-compressed data per month.
- The metrics website (metrics-web) uses the output from the metrics
data processor, stuffs everything into a database, computes aggregates, and presents results in graphs and .csv files.
- The Onionoo service processes the same data from the metrics data
processor, but provides statistics per Tor relay, not for the Tor network as a whole. The processing is done every two hours and may take 30 minutes to 1.5 hours, depending on how overloaded the server is.
- The ExoneraTor service, again, uses the same data and puts it in a
database to answer whether a certain IP address has been a Tor relay at some point in the past.
That's what is done. And here's how it's done under the surface:
- There's one or more cronjobs, each of which starts an ant task to
process data. Some of these tasks import data into the database, others store results in the file system.
- Each application uses a web application deployed in Tomcat to provide
results to web users. Most things are written in servlets, some use JSPs.
My problem is that this approach is rather fragile and difficult to setup for new volunteers. I'm aware of that, and I'd like to improve it.
My question is: what Java frameworks should I be looking at for the applications described above? Bonus points if something is in Debian stable.
Note that "switch to $some_other_programming_language" is not a very useful answer to me, at least not for the larger applications. There's just too much existing code and not enough developer time to port it.
Thanks in advance!
All the best, Karsten _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
You could try Spring. It seems to be a common framework for Server Side Java. http://projects.spring.io/spring-framework/
Noah Rahman wrote:
Hi Karsten,
A lot of people I respect seem to use Dropwizard for this sort of thing.
https://dropwizard.github.io/dropwizard/
As for deployment on Debian (dunno if it's in the standard Debian universe)
https://groups.google.com/d/msg/dropwizard-user/gv4TDQbcHBc/LGJz0egMNWQJ
Hope that helps Best
Noah
On Tue, May 20, 2014 at 12:54 PM, Karsten Loesing <karsten@torproject.org mailto:karsten@torproject.org> wrote:
Hello devs, I'm seeking advice from people with experience in writing server-side Java applications. Let me give you some background about this request: for the past five years, I have been developing server-side Java applications which all process large amounts of Tor directory data and provide their output via a web interface. Examples: - The metrics data processor (metrics-db) fetches Tor descriptors from the Tor directory authorities, the bridge authority, etc., performs some sanity-checks, and provides descriptors by type as tarballs. We're talking about roughly 7 GiB new bzip2-compressed data per month. - The metrics website (metrics-web) uses the output from the metrics data processor, stuffs everything into a database, computes aggregates, and presents results in graphs and .csv files. - The Onionoo service processes the same data from the metrics data processor, but provides statistics per Tor relay, not for the Tor network as a whole. The processing is done every two hours and may take 30 minutes to 1.5 hours, depending on how overloaded the server is. - The ExoneraTor service, again, uses the same data and puts it in a database to answer whether a certain IP address has been a Tor relay at some point in the past. That's what is done. And here's how it's done under the surface: - There's one or more cronjobs, each of which starts an ant task to process data. Some of these tasks import data into the database, others store results in the file system. - Each application uses a web application deployed in Tomcat to provide results to web users. Most things are written in servlets, some use JSPs. My problem is that this approach is rather fragile and difficult to setup for new volunteers. I'm aware of that, and I'd like to improve it. My question is: what Java frameworks should I be looking at for the applications described above? Bonus points if something is in Debian stable. Note that "switch to $some_other_programming_language" is not a very useful answer to me, at least not for the larger applications. There's just too much existing code and not enough developer time to port it. Thanks in advance! All the best, Karsten _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org <mailto:tor-dev@lists.torproject.org> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On 20/05/14 22:21, Noah Rahman wrote:
Hi Karsten,
A lot of people I respect seem to use Dropwizard for this sort of thing.
https://dropwizard.github.io/dropwizard/
As for deployment on Debian (dunno if it's in the standard Debian universe)
https://groups.google.com/d/msg/dropwizard-user/gv4TDQbcHBc/LGJz0egMNWQJ
Hope that helps
Hi Noah,
this is indeed very helpful!
I started looking at the libraries used by Dropwizard and couldn't resist to try out the Metrics library which provides very useful performance data. In order to use it, I had to switch from Ant to Maven which kept me busy for a while. And at a later point, when reading more about Dropwizard and alternatives, I found myself watching a Spring tutorial. And then I realized that I'll need an alternative for fetching recent Tor directory data than rsync'ing a few hundred thousand files from the metrics server. Wow, where did all these wonderful yaks come from, and who's going to shave them?!
tl;dr: thanks for the very valuable pointer! :)
All the best, Karsten