It’s been a while that I’m thinking about a new search engine, and the need for it has also recently been raised in communities like HN. In this post I will share my thoughts on the features I would like to have in the next search engine.
As already highlighted by others, creating a new internet search service is not an impossible task nowadays. Technology is evolving with low storage and computational cost compared to 20 years ago.
Alternative search engines have been created but not one has been a “good enough” replacement for Google. Why? Because all of them tried to compete directly with it, the main product of the third-largest company in the world for market cap. If you see the alternatives they offer one or two new features, but the search experience remains exactly the same. For a new startup idea, you need to provide a 10x innovation with your product to succeed existing competitors. Privacy, alone, is not enough.
First of all the business model. I want the subscription payment to be the main source of income for the company. As an engineer, the search is one of the most useful tools I have. I want a premium service and to pay for it. Having revenues directly from the customers could solve three main problems that people have with Google; ads, user privacy, biased page rankings. I will avoid discussing the first two.
As a search engine, I expect it to provide a way for presenting a list of pages in which the content is matching my queries. This comes with some issues to manage, like how to correctly rank quality pages and how to filter spam content. These two issues are related to each other. The spam one has already been solved - if we consider spam “irrelevant or unsolicited” content. Just create rules to recognize and filter spam results. Instead, the first problem is still open.
In the beginning, there was a curated collection of web pages per topic, as Yahoo directory. That solution wasn’t scalable in 1994 and still is not today with billions of web pages. After Google created the SEO, an algorithmic way to rank a web page, based on the number of links in the web that are pointing at it. Google has changed a lot in 20 years and so did his SEO techniques and filters. The only problem is that this SEO is algorithmic. This means that there is a precise way to create a webpage that the Google engine likes and would like to push it to the top of the search results. That’s why SEO experts exist.
So we arrive at 2022, where if I’m searching on Google using a query more vague than a copy-pasted debug error, I’m getting lost between Amazon products, Medium posts and marketing content of some anonymous startup with a good SEO expert. Is this spam? Yes, it’s unsolicited and low-quality content not strictly related to my search. Addressing this problem should be the main focus of a new search engine. Users are a good resource to relate their behaviour on a webpage with its quality. They can also report spam or unwanted results. Users can recognize a poor quality content of a webpage in less than 5 seconds. Can an AI do that?
Another good feature is understanding the category of the query. The same keywords could be used when searching for a product, for a blog post, for a news article, etc. Having an option to tell the search engine where to go to search for content would be useful. There are some projects that are trying to adress it. Other than that there is privacy-oriented personalization. Similar to a reccomendation system, it might be possible to group similar users, present to them related results and enrich their search with other relevant topics and content.
Another note is that social media users of Twitter, Reddit, etc, stay on that kind of platform also to discover relevant content related to their niches or more general interesting quality content. So one of the functionalities of those socials is to provide curated links to material difficult to discover otherwise. Unfortunately, they are filled with ads and marketing stuff, and they require users to constantly browse them to not miss out on something. Indexing this material would be useful too.