Exploring Yandex Search Ranking Factors with Leaked Insights

January 31, 2023

The search marketing community is trying to make sense of the leaked Yandex repository containing files listing what looks like search ranking factors. Ryan Jones (@RyanJones) believes that this leak is a big deal and has already loaded up some of the Yandex machine learning models onto his own machine for testing. Although some may be looking for actionable SEO clues, the general agreement is that it will be helpful more for gaining a general understanding of how search engines work (@RyanJones, January 29, 2023). As Ryan said "[If] you want hacks or shortcuts those aren’t here. But if you want to understand more about how a search engine works, there’s gold."

Ryan believes that we can learn a lot from examining the leaked list of Yandex Ranking Factors, however examining this list alone is not enough. He explains that while Yandex is not Google, it does use many of Google’s invented tech like PageRank and BERT. He also states that the factors and weights applied to them will likely vary across search engines, but the computer science methods used to analyze text relevance will be very similar. Additionally, he notes that there are more ranking factors calculated in the code than what have been listed in the leaked lists floating around, with some negative weights being assigned to factors which SEOs may assume are positive or vice versa. Link: https://www.seobility.net/en/blog/yandex-ranking-factors/

It's widely believed that Yandex uses 1,923 ranking factors for its search engine. However, Christoph Cemper (LinkedIn profile), founder of Link Research Tools, has heard from friends that there are many more ranking factors than initially thought. According to these sources, Yandex employs 275 personalization factors, 220 “web freshness” factors, 3186 image search factors, and 2,314 video search factors. Surprisingly enough, the search engine also utilizes hundreds of link-related ranking factors as well. This indicates that there are far more than the 200+ ranking signals Google originally claimed to employ in its SERPs.

Recent data leaks of Google's search engine algorithm have raised questions about who really knows the entire algorithm. It is astounding how organized the ranking factors were when they were leaked and many are now questioning if there is a comprehensive spreadsheet at Google that contains all the ranking factors.

Christoph Cemper, an SEO expert, commented on this topic to Search Engine Journal saying that it "always seemed absurd" to him that not even Google employees would know the entire algorithm. He goes on to say that such a complex system needs to be documented and even code could be leaked. So maybe this data leak will help move away from thinking of Google’s algorithm in its current terms.

Recently leaked Yandex files have provided an opportunity to view part of how a search engine (Yandex) ranks search results, though the data doesn't show how Google works. Among the insights that have been revealed are related to the Yandex neural network called MatrixNet, introduced in 2009 via an archive.org link to announcement. It is important to note that contrary to some claims, MatrixNet is not equivalent to Google’s RankBrain – Google's limited algorithm focused on understanding the 15% of search queries that it has not yet seen before. In October 2015, Bloomberg released an article (Archive.org snapshot) that revealed the introduction of RankBrain to Google’s algorithm that year, six years after the introduction of Yandex MatrixNet. According to the article, RankBrain is limited in purpose and is designed to guess at words and phrases similar in meaning to queries it isn't familiar with in order to effectively handle never before seen search queries. In contrast, MatrixNet is a machine learning algorithm that classifies a search query and applies appropriate ranking algorithms accordingly. A 2016 English language announcement (here) further detailed how this algorithm works in modern day web searching on Google's platform.

MatrixNet is a powerful ranking algorithm that allows users to generate complex formulas and customize them for specific search queries. This ensures that the quality of ranking for one type of query does not affect the overall performance of searches for other types. Unlike other ranking algorithms, MatrixNet permits users to make adjustments and tweaks to certain parameters without needing to completely overhaul the whole system. Furthermore, MatrixNet can automatically choose sensitivity levels for different factors in the ranking formula. With these advanced features, MatrixNet has distinguished itself from RankBrain and other comparable algorithms. https://www.deepcrawl.com/blog/rankbrain-matrixnet-same/

MatrixNet is an important factor to consider when examining the Yandex ranking factors documents. It is important to understand the Yandex algorithm in order to make sense of these documents. To do so, readers can read an article outlining Yandex's Artificial Intelligence & Machine Learning Algorithms here. Dominic Woodman, a Twitter user with the handle @dom_woodman, has made some interesting observations about the leak and found that some of these factors align with SEO practices such as varying anchor text (as tweeted by him here).

Alex Buraks (@alex_buraks) has recently published a Twitter thread about the importance of internal link optimization for SEO purposes. Google’s John Mueller has long encouraged publishers to make sure important pages are prominently linked to, and discourages burying them deep within the site architecture. He stated in 2020: "[So what will happen is], we’ll see the home page is really important, things linked from the home page are generally pretty important as well....as it moves away from the home page we’ll think probably this is less critical" (John Mueller shared). This suggests that by keeping important pages close to the main pages site visitors enter through, they will be seen as more important if they are linked to from there.

John Mueller of Google recently addressed the issue of crawl depth being a ranking factor. He clarified that it is not a ranking factor, but rather is a signal to Google which pages are important. Alex Buraks then cited a Yandex rule that uses crawl depth from the home page as a ranking rule, suggesting that it is important to keep important pages close to the main page (1 click away) and less important pages should be 3 clicks or less away. This makes sense in terms of importance being assigned as one moves further away from the home page. There are also Google research papers such as the Reasonable Surfer Model and Random Surfer Model which calculate the probability of random surfers ending up on given webpages by following links. Link to tweet: Link to Reasonable Surfer Model:

Alex Buraks, an SEO expert, recently tweeted that backlinks from main pages are more important than those from internal pages. His tweet links to a picture showing a diagram representing this statement. This rule of thumb is important for SEO as it helps ensure that important content is kept close to the home page or inner pages that attract inbound links.

Recently, a leak was discovered in the way that search engines read webpages. This leak is still in its early stages, but it has the potential to provide future users of search engine optimization a better understanding of how these services work. Through further investigation and analysis of this leak, people may gain a greater insight into how search engines work and the various components that drive them. Featured image: Shutterstock/san4ezz. Check out Search Engine Journal's news and SEO category for more information on this topic.

Google Ads Introduces Account-Level Negative Keywords Feature
Google Ads Liaison Ginny Marvin recently announced that the new account-level negative keywords feature is now available to worldwide Google Ads advertisers. This feature, which was first announced in...
Read More
ChatGPT Enhances Math Capabilities with Update
OpenAI has released an update to its popular language model, ChatGPT, to enhance its accuracy and improve its ability to handle math equations. According to their January 30th release notes, the new u...
Read More