Hi everyone,
I'm working on ahmia.fi, the hidden service search engine and you're reading status update #6.
During the last two weeks, I finished porting the django app to the new structure. I'm also working on last minute things before shipping the new site online.
I will continue updating documentation and add some unit tests to the project.
The code is not merged yet but you're welcome to check it on my forks. [1] [2]
Since this status report is short, here is a list of goals I had in my initial project proposition and what work has been done on each.
Review code and infrastructure: - Split the project in several repositories - Improve documentation - Automate testing (Travis.CI) - Track code quality (Landscape.IO) - Track requirements (Requires.IO) - Refactor each subproject
Improve search results: - Better use of elasticsearch (use of stemmers, shingles, term-centric search) - Search results are now pages instead of domains.
Improve UI/UX: Not much work has been done for this goal. The website has been in the process of porting old pages to a new design. All pages are now using the new design.
Gather more statistics: - Pagerank is now used to compute an authority score for each page - I suggested that we could use a self hosted statistics framework like piwik [3] but no decision has been made.
Use stats to better rank search results: - Results are ranked by authority score.
Make sense of the indexed info to understand a search meaning: - Shingles enable us to differenciate these two queries: "i'm not happy i'm working" and "i'm happy i'm not working". - Synonyms could be used by the search algorithm if we provided a synonym dictionnary. No work has been done at making this work.
Make a google trend-like interface to visualize searches over time: No work has been done to reach this optional goal. Even some stats fonctionnalities were dropped in the new site because they were "domain- centric" when a search engine needs to be "page-centric". We could probably index searches in elasticsearch and use Date Histogram Aggregation [4] to display trends.
Make stats available with the API: No work has been done to reach this optional goal. Some API endpoints were also dropped because they were domain-centric. It would be great to have an API with a coherent url scheme. I think Django Rest Framework can help design that API while keeping the code simple.
That's it for this week, Have a nice weekend.
Ismael R.
[1] https://github.com/iriahi/ahmia-site [2] https://github.com/iriahi/ahmia-crawler [3] https://piwik.org/ [4] https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggre... [5] http://www.django-rest-framework.org/