A search engine built by the crowd, that does not suck

tile_1400x560

Google’s PageRank has revolutionized search, but over the last ten years it has forced every other search engine to look and act like Google. This has made it almost impossible to establish a better, modern search engine other than Google, granted it’s popularity is the major indicator for it’s quality. Nice for Google, not so nice for healthy competition.

Meet the DwellRank

During our work at archify, a private search engine across the content of your browsing history, Max and I realised that there is one core metric that could be an alternative to the PageRank. We learned that the biggest commitment a user can make to a web page and content, is the time the user spends on it. We can safely assume that a web page where users only spend a few seconds on is not as important as a page where they spend several minutes. That’s a much stronger commitment than hitting a like button can ever be. And it’s a stronger indication for quality content than links between machines. We call this new way of ranking pages, DwellRank.
Dwellrank is a better way to find the right search results, based on the time users actually spend on the page.

Meet Blippex

After 2 years of valuable experience with archify we decided to build a search engine called Blippex, which is based on the DwellRank. We’ve built extensions for all major browsers (sorry IE), which sends the URL of the web page and the time they spent on it to our servers. The database is built by the input from humans, which helps create the Blippex search engine. It is also available for everyone, even if you have not installed the extension (but you should install it!).
The challenge around Blippex is finding enough people who are willing to share their data. As a result, we have created many privacy safeguards. If you use the extension we will only save three data points:

  • the URL
  • the current time
  • the time you spent on the page

It is very important to make clear that we do not save anything which could possibly identify you. It is also important to say that if you have any ideas on how to improve the security of Blippex even further, we are happy to do so. But let me get back to Blippex and why we built it.

A search engine built by the crowd, that does not suck

It is very important to make clear that we do not save anything which could possibly identify you. It is also important to say that if you have any ideas on how to improve the security of Blippex even further, we are happy to do so. But let me get back to Blippex and why we built it.

Search should be controlled by the people using it

Search should not be a blackbox. People should be in control of pretty much everything. At Blippex, people provide the data for the search engine and help make it better. They are also in control of the search results and can easily manipulate the different parameters.

Search Parameters should be transparent

Every parameter of the search should be available for the people who are using it. That said, it’s a work in progress, especially with the scoring of search terms, but it is possible to control how many days the search should go back and influence the final scoring in terms of searchterm frequency (how often the searchterm is mentioned on the page) versus DwellRank.

Search Ranking data should be available

Although we know the basic parameters of search algorithms, their complexity is hidden in a big blackbox. Blippex intends to open its search algorithm data. As a first step we will include the scoring details into our public API, which will be available for everyone. This is the same API that Blippex.org uses.

The Search database should be open

Not only should the ranking be transparent but also the data itself. Especially at Blippex, where the data comes from our users we think it is our duty to give it back to the users. Therefore we will publish a dump of our database every month.

Searching needs strong privacy

We do not think that there is anything wrong with tracking your users because it is sometimes the only way to improve your service. But the users should always be aware what is being tracked about him and how it is stored or even shared. Search engines have a special duty because they are dealing with with very sensitive data about their users and their interests. That’ why we are trying to keep the privacy level at the highest possible level.

We are releasing Blippex today and invite you to participate in making a search engine that doesn’t suck and is built by the people, for the people.

Gerald & Max
PS: Why the heart in the Blippex logo? Only humans have a heart!

25 thoughts on “A search engine built by the crowd, that does not suck

  1. I like you concepts, but I have some ideas:
    1. P2P let your clients be servers. How do you trust them? redundancy. If I send calculation x to servers 1,2,3 and 1 and 3 return y but 2 return y’ then trust 2 less. nodeindextrust=nodeindextrust – beta ( could use some kind of sigmoid funciton.
    2. Allow alogithems to compete. ranking has lots and lots of signals, social, pagerank, content, meta data, domain info, etc.
    3. Treat it like a giant parsed sql database of the entired internet. select * from pages where words contains ‘dogs’ and image_count >3 and form_count > 1 order by dwellrank
    4. dwell time could have buckets like 0-10 seconds, 11 s – 1 min, etc. the bucket sizes could be tweeked.
    5. learn fro wikipedia bake advertising in from the start: somenode.returnadvertisers(keyword, ad_algo)
    6. work with css, xml data too. tables: node_data, node_connections, node_types, indexs
    7. work with multiple clients.

  2. Interesting idea guys although I have a few questions:

    A) What is to stop someone hiring a bunch of people on Fiverr / oDesk etc and having them deliberately spent time on your site in large numbers to increase DwellRank?

    B) Doesn’t this potentially reward long articles over those that are short and to the point? Longer content isn’t always better.

    • We had a extension for archify for IE and it took 90% of the development power to maintain for just a few user, it is just a pain to develop a extension for IE, when we can afford it we will do it of course!

  3. This is a great idea and I agree that a (new) search engine should be out there that gives Google at least a little hard(er) time, but,…

    Time on Site / Page is really just one factor and indicator of the importance of a document (out of many for giving search results a weight).

    Google itself is using it (time between SERP – page and bounce to SERP) as one of their metrics as well and PageRank for them is today just a small indicator. I am sure you guys thought about this, but how will you handle: Exits (ie. not returning to the SERP page and leave the site; or will this be handled through the add-on?), Long vs. Short content? Some pages are important, but have only a couple of paragraphs or an image etc. but are still valuable.

    I think building a search engine is not a trivial job and respect that you are all going to shoot for it!

    • Hi,

      yes, it is not a easy job and a experiment for us, let’s see how it works out!
      The extension measure the time when a tab is active and the browser is active and has a cap (a few minutes is the maximum) our experience from archify shows us that this is quite accurate.
      The DwellRank is not only based on the time but also on the number of visits and maybe we will add other criterias too, as i isaid, it is just a start :)

  4. I’m really liking what I’m reading here. Just a few questions about it. Does the extension track my search query on all searches I perform on any search engine? Then it measures how long I spend on each result I see to that query? The resulting data will either push up or pull down that URL in the results of Blippex? Or is it time on every page I visit that gets measured? What about tabs I keep open all day? Trying to understand how it works.

    Also, what are the safeguards from spammers? I can see this getting abused fast by hordes of spammers installing the extension and then camping out on their spammy websites. Does it just track where we go and measure how long we are on each domain/page?

    Fantastic concept. Not sure if this will be helpful to you or not, but I’ve long had an idea to improve search that takes quality raters a step further. I do Google searches all the time where only one or two results in the top ten are actually helpful. I wish there was a Chrome extension that would let me score each URL in the result as either answering my query or not answering my query. If this could be incorporated some way into Blippex I think that would be great. When I have specific searches I tend to spend a lot of time on every page looking for the information I’m trying to find. So time on page might not always be relevant necessarily. A scoring system or satisfaction score or something might help.

    Anyway, good luck and I will be installing the extension.

    • Thank you, we will try our best!
      The extension measures the time for every page you visit, independent if you are coming from a searchengine or not.
      We are still working on the best formula for the DwellRank, the time spent is just one (but the biggest) value in there.

      And we have some ideas how to get rid of spammers/SEO people, but this is right now a luxury problem, first we need a bigger index :)

  5. How You include new websites in the database? If it was released today, and nobody knows it, how will Blippex find it? You will be based still on other search engines or as the owner / webmaster of the site I need to visit it having installed the Blippex add-on?

    • Yes, you need to visit it with the plugin installed, otherwise Blippex don’t know about it. We think that besides searchengine a lot of people find stuff for example via social networks, email, etc. so someone has to see it and hopefully sometime in the funture one of the people who has seen it has the Blippex extension installed.

    • Yes, the “Its” problem is hopefully fixed now, we are sorry, we are no natives :)
      Yes, Security != Privacy, we are very aware about this!
      Of course I know about the EFF-stuff, but we don’t log any browser-headers or anything else (i am nut sure if the request from the extension to the server send it anyway) and it would be really great if there would be for example a TOR-implementation in javascript that we can include in the extension, would make every request even more anonym.

  6. I can’t attach a screenshot here, but at the moment I have 7 opened tabs: out of them 3 has been there since yesterday — I wanted to read them thoroughly, didn’t have time\mood so far, but still hope to get back and 1 tab has been open for more than a week — it’s a book to which I turn once in a while. Since the invention of tabs time is a loose concept, time pieces spent on different pages are not of equal value. How are you gonna tackle this issue and differentiate?

  7. This seems like an cool idea for a search engine and I can think of other good ways to put the Dwell factor data to use. The real time adjustment of the Dwell factor is pretty cool. I do forsee some potential problems with displaying relevant results for some searches and people gaming the system. Will be interesting to see how this goes.

  8. How You will include new pages/websites to the database? If I see it correctly, it should be visited first by someone who has the add-on installed.

    Let’s say I will create the best content in the world for a specified “keyword”. How will Blippex decide which is the best for the searcher? The new content or an old content which already works well? What if the new content will be better than the old one? This way the search results on Blippex will be based in a % by the results generated on other Search Engines.

    What other parameters are You considering important in the ranking system?
    I checked some keywords, but the results were not the best. I think there should be implemented the language based search result, so when I do a search I need only results in languages which I understand. For example I searched “Denzel Washington”. There were results in English, Spanish, Dutch and French, but I was only interested in English result. Do You plan to implement this?

    Blippex is a good idea, You have my support! I already installed the add-on and I will try to come with other suggestions.

    • Right now we don’t check what language a page is (it’s also a privacy issue, as long as you don’t have enough users it would be easier to track down the one person that speaks a rarly used language) but of course it makes sense to implement something here in the future.

      We are still experimenting a lot with the ranking, any ideas to make it better are welcomed, we will release in the coming weeks the formula we are using right now but first we have to learn a little bit more when we have more data.

  9. I assume that the Dwell Rank will be much tighter than has actually been mentioned here, because it really is more complex than “person spends X minutes on site A, therefore it is better than site B”. As has been mentioned above by HP, what is the consideration for pages with short (yet fulfilling and useful) content? A person may search for a very specific query, hit the top result, find their answer, and bounce. This is a very effective search path yet according to a simple “time on page” algorithm, this would be deemed as ineffective and a poor searcher experience?

  10. Pingback: Dwell Rank – A different Page Ranking Algorithm | Dorai's Learn Log

  11. Pingback: search | playplay365

  12. Pingback: 3 Search Engine Startups Setting Google in Their Sights | SiliconANGLE

  13. Pingback: Blippex: A new search engine | Gerald Murphy's Search

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s