Juha Nurmi juha.nurmi@ahmia.fi writes:
And what would you like to do over the summer so that: a) Something useful and concrete comes out of only 3 months of work. b) Your work will also be useful after the summer ends.
I would be interested to see some areas that you would like to work on over the summer, and how that would change the ahmia.fi user experience.
I have drafted a timetable for the possible new features to ahmia.fi:
https://docs.google.com/document/d/1XB42HM4uESYBAnoHHRuaqKMP64VFDI91Qa-CtIuy...
Hello Juha,
here are some comments on your proposal:
Search development
Full text search development Popularity tracking (catch users clicks and tell YaCy the popular
pages): development of a popularity tracking feature for ahmia.fi and Integration of that feature with YaCy API (providing stats for popular pages and suggestions for relevant results)
1-3 workdays
Yes, this is definitely useful.
I would also like you to check out how backlinks work, and whether your crawler can start counting HS backlinks too. Mainly because popularity tracking is easily gameable, whereas backlinks might be harder to game (still definitely gameable though; SEO is crazy).
To make sure that this section is done properly, I would suggest to compile a list of well-known HSes and verify that they all appear on/near the top of the ahmia search results by the end of development of these features.
I would suggest using more than 1-3 workdays for this.
Use an another crawler to search .onion pages from the public Internet Search new .onion domains from different online sources This is an excellent case to test open source crawlers like Heritrix and
Apache Nutch
1 workweeks
Yes, this is very useful.
Public open YaCy back-end for everyone let’s make our YaCy network open so anyone can join to it with their
YaCy nodes
This way we could get real P2P decentralization Share installation configuration package that joins a YaCy node to
ahmia.fi’s nodes
1 workweek
I guess this is also useful.
Better edited HS descriptions Design and development of a more useful and complete UI including more
comple and exaustive descriptions and details (e.g., show the whole history of descriptions and let the users edit it better)
1 workweek
Yes this seems like a good idea. Improving the UX is very important.
Because of the security nature of ahmia, the UX should be security conscious too. For example, you shouldn't give your users too much confidence on the ordering of the search results since a motivated adversary can probably influence it.
Maybe you could also expose some of your popularity/backlinks information to users, in case that lets them pick results more safely.
Comment and vote about the content (safe/unsafe) Ahmia.fi needs a commenting and rating systems for hidden services It is useful to gather a user's knowledge about the sites 1 workweek
I think that this needs more thinking.
The rating idea is trivially gameable. Do we assume that all users are good citizens?
Given that there are shitloads of phising websites registered to ahmia, we take it that there are bad people out there who know of ahmia. How will the rating system interact with bad people? What about the comming system? Is this also an argument against popularity tracking? How do we use this technologies usefully in the face of bad people?
Tor browser friendly version of the ahmia.fi Development of a JavaScript free version of ahmia.fi 1 workweek
TBB has javascript enabled these days. I would probably spend this one week on other stuff.
Search API 1 workweek
What do you mean by this? Do other search engines provide this sort of API?
This would need more than one week to design and deploy properly, no?
Automated statistics and visualizations about hidden services and their
content
Development of an Analytics feature As the result of the indexing Tor network's content ahmia.fi can produce
an authoritative and exact quantitative research data about what is published through the Tor network.
2 workweeks
Automated visualizations It is very practical to visualize the data 2 workweeks
Both of the above items are statistics and they seem to require 1 month of development. Are there really that many stats that we can/should produce?
What kind of stats are you thinking of, other than the "number of HSes added per month", "number of ahmia visitors", etc.? BTW, we should be very careful that stats are privacy preserving.
Show cached text versions of the pages 1 workweek
Useful. I thought you had this feature in the past though; no?
API development
In addition, ahmia.fi provides RESTful API to integrate other services
to use hidden service description information (see https://ahmia.fi/documentation/). Hidden services can integrate their descriptions directly to the hidden service list (see https://ahmia.fi/documentation/descriptionProposal/). Ahmia.fi knows which hidden services are online and you can use the API to check hidden service's online status. This API should be maintained general and simple.
Integration with softwares that are using hidden services Integration with Tor2web Thanks to our suggestion recently, Tor2web has implemented a feature
that provides secure and anonymous statistics within a day. I want to implement to implement an automatic fetch and handling of this data.
Ahmia.fi should fetch these and add each new .onion page Child pornographic is a plague for the Tor network and a well designed
and authoritative entity may be useful for provide some filtering lists. To this aim we are currently handling manually a filter list already integrated with Tor2web and in use on quite all the nodes of the Tor2web network (https://ahmia.fi/policy/, https://github.com/globaleaks/Tor2web-3.0/issues/25). In collaboration with Tor2web i want to develop an efficient and automated system to handle and share a filtering information in a secure manner.
1 workweek
Hm, this is interesting but potentially controversial. Where is this data?
Development of a Content Abuse Signaling feature in order to allow
fast handling of abuse comments; i want to implement a Callback API in order to publish this data to Tor2web nodes in real-time.
1-3 workdays
Ehm, so you are going to expose all the banned pages to Tor2Web? Is this API going to be public? Will anyone be able to see the banned pages?
If it's not public, how are you going to protect it? Is this doable in 1-3 workdays? Is this worth doing?
Globaleaks integration Currently, GlobaLeaks informs ahmia.fi to index new hidden services Ahmia.fi could extend the visibility of Globaleaks on the search results Together with GlobaLeaks: RESTful API according to Globaleaks’ needs 1 workweek
So you will make an API that allows people to submit HSes to ahmia? Will this be useable by anyone; can it be exploited? If not, how will you protect it? Is this really worth doing?
Estimated amount of work is 13 weeks.
All in all, the timetable looks good.
I'm quite excited about the changes to your crawler (that will give us a bigger list of HSes), and the changes to your indexing (popularity tracking/backlinks etc.). I think you should devote more time to these so that they are done properly. You currently estimated 1.5 weeks to those tasks, but maybe you could bump it to 3 or 4 weeks. OTOH, I don't know much about search engines so it might be easier than I think.
I'm also excited about the UX changes and statistics, but I'm not sure if I would devote one month just for statistics. Maybe steal some time from statistics and give it to the crawler/indexing and UX? Maybe not?
The API stuff and the "Integration with *" projects are probably harder/riskier to do than they seem. Are we sure we want to do them? Better to do fewer things properly, than many things sloppily. Or not?
I would also like to see the code base cleaned up a bit. For example a README file, some basic description of what each file is doing. Probably also include the YaCy/crawler configs?
I would also like Ahmia to have some docs on the website. I would like to see a doc on how ahmia works, including how its components interact with each other. And I would also like to see a doc that explains to users the threat model of Ahmia; that is, what technologies ahmia has in place to defend against phishing, how likely they are to succeed, and how cautious users should be.