There was quite a stir in the SEO community recently about something other than ChatGPT, when it was revealed that part of the source code for the Russian Yandex search engine had been leaked. More importantly, the leaked code seems to contain over 17,000 ranking factors allegedly used by Yandex.
If you like to take deep dives into how search engines work it’s worth exploring this code, as many have already been doing. The Search Engine Land article linked above contains some excellent resources for such research, and this LinkedIn post by Lukasz Zelezny nicely summarizes most of the standout revelations. But the most thorough analysis to-date on the similarities and differences between Yandex and Google has been published by Mike King.
However, seeing that Yandex is the dominant engine only in Russia (with some presence in Turkey and other countries), we think the more pressing question is: Can we learn anything about Google ranking factors from the Yandex code?
Before trying to answer that question, here are some important facts about the leaked code:
- It likely does not contain all of Yandex’s ranking factors.
- Yandex has said this is from an older version and does not necessarily reflect the code they use currently.
- While we have some of the initial weightings of these ranking factors, these are the relative weights before these factors are fed into the machine learning algorithms. They do tell us which factors are given more priority, but not how the algorithm chooses which to use in the mix for any given query.
Does the Yandex Leak Tell Us Anything About Google?
First, (and not surprisingly) many of the factors match those we know or strongly suspect Google uses.
Second, there are factors that are highly unlikely to be found in Google. For example, Yandex favors Russian sites and Russian media over sources outside Russia.
Third, there are factors Google has explicitly told us they do not use. Most prominent among these are user behavior factors, including:
- Bounce rate
- Pogo-sticking (visiting multiple search results in quick succession from the same query)
- Dwell time (how long a user stays on a site after clicking from a search result
Concerning these, Google spokespersons have said they may use them in isolated tests, but find them too unreliable to use as actual ranking signals. In addition, Google would only have access to some user behavior metrics via a Google property like Google Analytics or the Chrome browser. About 28 million sites use GA, but that’s only a small portion of the estimated 2 billion sites that exist on the web.
Other factors Yandex uses that Google most likely don't include:
- Meta keywords
- A site authority score for weighting the authority of link sources
- Too many server or 4xx errors
- Many aspects of URL construction/formatting
- How well a URL matches the query
- Depth of crawl from the home page
- Too much text on a page
What Does the Yandex Leak Teach Us Then?
Perhaps one of the most important takeaways from the leak of the Yandex code is how much it confirms many things we've believed are search engine essentials, and are therefore likely (or certainly in many cases) part of Google. There are also things that Yandex lacks that Google almost certainly includes.
Yandex vs Google Similarities and Differences
- PageRank: Yandex includes a version of Google's PageRank. Google recently confirmed that although PageRank, the original patent behind Google by founders Sergey Brin and Larry Page, is much less involved in ranking these days, it is still present in the ranking algorithms.
- Stratified Crawling: Like Google, Yandex uses a multi-distributed crawler system. While Yandex's is dual, with a crawler for real-time and another for general crawling, Google's is tripartite. In addition to a real-time crawler, Google separates out crawlers for pages to be regularly crawled from those to be rarely crawled.
- Both Yandex and Google make use of BERT to better understand and process queries.
- Google however is slightly more advanced in its natural language processing. While Yandex confines itself to indexing by bigrams and trigrams, Google also employs n-grams, or phrase-based indexing.
- Both engines process queries by first checking to see if their general index of popular results satisfies the query (Yandex calls this level Metasearch). If not, the queries is submitted to thousands of virtual machines simultaneously, each with its own set of priorities, and those machines make weighted recommendations that are then evaluated and formed into the final result.
Conclusion: The Yandex leak is certainly a fascinating peek into the inner workings of a major search engine. It does confirm many things we have long believed to be essential to the ranking engines of all such engines. However, it is not Google and not bound to work the same way as Google, nor to follow whatever philosophical or ethical guidelines Google maintains for search.
Moreover, this tweet by SEO Joe Hall might say it best:
We not only still don’t have all the “ingredients” of Google’s recipe for ranking, we also don’t know exactly how they mix those ingredients in their SERP recipes. As noted above, without knowing the exact way Yandex applies weightings to a given query, we don’t even have the “recipe” for Yandex now.
Our best advice remains the same as always: instead of chasing individual ranking factors, concentrate on building the best site and content you can balancing seoClarity’s proven framework for SEO success:
- Usability: the technical soundness and friendliness of your site for both users and search engines
- Relevance: the degree to which your pages, content, and entire site syncs up with and meets the needs and intentions of your intended audience
- Authority: the mix of signals that tell a search engine your content and site should be highly ranked
The author wishes to acknowledge invaluable insights from the following who spent many hours poring over the Yandex documents: Ryan Jones of Razorfish, Mike King of iPullRank, and Lukasz Zelezeny.
Comments
Currently, there are no comments. Be the first to post one!