Hello,
I am a PhD student at Georgia Tech and I am collaborating with researchers at Stony Brook University to find an effective means of detecting block pages. We have access to a very robust set of both real pages and blocked pages (~2.4 million pages) which we are using to evaluate block page detection metrics.
We already have two measures to detect block pages and would like to evaluate your DOM similarity measure alongside our own metrics. Since we are planning on publishing our results, may we include your DOM similarity measure in our evaluation? Also, I would like to look more into this similarity measure, is there a paper that I can read?
Thanks, Ben Jones
On 2013-11-25 15:23, Ben Jones wrote:
Hello,
I am a PhD student at Georgia Tech and I am collaborating with researchers at Stony Brook University to find an effective means of detecting block pages. We have access to a very robust set of both real pages and blocked pages (~2.4 million pages) which we are using to evaluate block page detection metrics.
Awesome, we would love if you could publish this dataset along with your code!
We already have two measures to detect block pages and would like to evaluate your DOM similarity measure alongside our own metrics. Since we are planning on publishing our results, may we include your DOM similarity measure in our evaluation? Also, I would like to look more into this similarity measure, is there a paper that I can read?
Any feedback and/or contributions that your team makes are very much appreciated. We develop ooni-probe with researchers like yourselves in mind, and similarly, we publish and make available all of our work so that others can leverage our combined efforts.
I do not believe there is a published paper, and the DOM similarity measure is (rather) experimental. If you use the DOM similarity test in your paper, I believe it would be the first to do so.
--Aaron
Thanks, Ben Jones _______________________________________________ ooni-dev mailing list ooni-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-dev
On 11/25/13, 4:23 PM, Ben Jones wrote:
Hello,
Hi Ben,
Thanks for your interest in OONI :).
I am a PhD student at Georgia Tech and I am collaborating with researchers at Stony Brook University to find an effective means of detecting block pages. We have access to a very robust set of both real pages and blocked pages (~2.4 million pages) which we are using to evaluate block page detection metrics.
This is an extremely valuable dataset. Under what sorts of license are you able to release such dataset? Would it be possible to ship your dataset, for example, as part of the ooni-probe debian package?
We already have two measures to detect block pages and would like to evaluate your DOM similarity measure alongside our own metrics. Since we are planning on publishing our results, may we include your DOM similarity measure in our evaluation? Also, I would like to look more into this similarity measure, is there a paper that I can read?
Ah yes, I wrote that quite some time ago, but never wrote a paper. Keep in mind that I don't have lot's of experience with machine learning and did this just as a personal pet project to try out some of the things I learned studying mathematics at university.
I wrote up a brief description of how the method works here: https://lists.torproject.org/pipermail/tor-dev/2012-August/003957.html.
I would be very interested in checking out what method you have applied to DOM similarity measurement. I also have a feeling there is some way of proving that an eigenvalue approach is somehow correlated to computing a labelled tree distance between the DOM pages.
~ Art.