Table of Contents
A relationship between two entities is the building block of a semantic connection that is used by Search Engines to build the semantic web and knowledge graphs. Using Entity relationships, Google (and other Search Engines) are able to provide the best user experience by connecting the search query to various existing Knowledge Graphs using NLP algorithms that analyze webpage content.
What is an Entity?
Entity is a specific, identifiable thing that is represented in the content of a web page. This can include individuals, places, organizations, and other distinct concepts. Google uses various methods, such as structured data and natural language processing, to identify and understand entities within web content.
How do entities relate to one another?
By recognizing and connecting entities, Google aims to provide more relevant and informative search results to users. But what are the criteria for their relationship?
Introducing Semantic Triples
A semantic triple is a sequence of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions. For example, “Brad Pitt acted in Fury” can be represented as a semantic triple. The three entities are called subject, predicate, and object. This format enables knowledge to be represented in a machine-readable way, and every part of a semantic triple is individually addressable via unique URIs.
From these triples, we can start to relate entities to one another. For example, if we have another triple that says “Angelina Jolie directed Fury”, we can infer that Brad Pitt and Angelina Jolie worked together on the movie Fury. By using this principle we can connect these two entities based on the fact that they:
- Appear in the same media
- Do the same job
- Appear in the same news articles
- Appear in the same movies
- Personal social accounts are interconnected
- Commonly used keywords when mentioned together
Over the years, the Web became filled with content. Search Engines improved in processing webpage content and site metadata, using sophisticated algorithms that rely on NLP techniques, Semantic Triples, N-grams, Knowledge Graphs, Entity Vectors, Semantic (schema) markup, etc.
Why does Google analyze Entities and the relationships between them?
There are a few reasons why Google does this:
1. Adding entities into the entity database
Over the years, Google has collected information on many entities found on various web pages on the net.
Many sites were used for entity identification, but nothing matched Wikipedia in this process!
Google relied heavily on Wiki Page to identify entities. Also, looking into how Wiki Pages link from One entity to another, Google was able to establish distant relationships and close relationships between entities.
Today, there are trillions of entities in Google’s database. You can check this database for yourself using Google’s API tool for developers
2. Building Knowledge Graphs
Google’s Knowledge Graph is a database of billions of facts about people, places, and things that allows Google to answer factual questions such as “How tall is Brad Pitt?” or “What movies did Brad Pitt star in?”. The Knowledge Graph is built by compiling factual information from a variety of sources, including public sources, licensed data, and content owners who suggest changes to knowledge panels they’ve claimed. Google’s automated systems strive to surface publicly known, factual information when it’s determined to be useful.
Knowledge Graphs are, among other things, based on earlier mentioned semantic triples.
Are you wondering if there is a knowledge graph for an entity you have created? You can test that using the following tool:
3. Semantic Connections
The strength of a relationship between two entities can be determined by the frequency of the triples that relate to them. This ‘semantic triple’ is what connects two different entities.
The Semantic Web is an extension of the World Wide Web, it involves a set of standards and technologies that allow machines to understand the meaning of web content. It is important because it enables the creation of intelligent applications that can automatically process and integrate data from multiple sources, making it easier to find, share, and reuse information on the web.
The Semantic Web connects entities by providing a common framework for representing and sharing data about them. This was possible due to the development of two specific techniques:
- Vector Space Models: used for establishing relevance between documents
- Semantic Schema Makrup: a shortcut for crawlers to use to obtain entity information
Before the Semantic Web (Entity Search) was created, Search Engines relied on Directories and Text Index.
Here’s how Search Engines developed over time:
|Method of Collection
|Automated crawlers to gather a massive number of records
|Organization by categories; manual entry
|Simple query matching against a pre-organized set of records
|Automated crawlers to gather massive number of records
|Keyword-based and metadata indexing using algorithms
|Keyword and metadata search to identify relevant documents
|Semantic (Entity) Search
|Google (advanced form)
|Automated data collection with an emphasis on understanding the context
|Indexing based on entities and their relationships, use of vectors and knowledge graphs
|Establishing document relevance through context and semantic relationships, enhanced by knowledge graphs
By using semantic metadata to describe the relationships between entities, the Semantic Web enables machines to understand the meaning of web content and make inferences about it.
For example, if there are many triples that state that Brad Pitt and Angelina Jolie worked together, then we can infer that Brad and Angelina have a strong relationship.
Page Rank helps us, based on the number of quality links, to determine entity relationships, and based on that we can serve related results and make new knowledge panels.
There are many tools such as text optimizer and seolyze that can help you identify related keywords and the nature of the relation between keywords, by simply looking at keyword frequency in a text document, or parsing web page sections and categorizing into topics of keywords.
If you wish to test a specific entity to get the knowledge graphs related to it, you can test if a knowledge panel exists for a given entity:
Here we can see that there are three persons that match the query, but Brad Pitt, the actor, has the highest score, a number that represents the importance assigned by the search engine to each entity in response to a particular query.
How to determine the Entity Relation Score
The relation score refers to a numerical value that Google assigns to the relevance or importance of the search result in relation to the search query. This score helps in ranking the search results, with higher scores generally indicating greater relevance or prominence in the context of the search. It’s a part of Google’s Knowledge Graph, which aims to enhance search results with semantic search information.
There is a tool that does this it represents the knowledge panel in the form of schema markup.
In the given code, among other things, you can find the entity relation score.
“name”: “Brad Pitt”
The Relationship between Search Queries and Entities
Understanding the relationship is essential in the performance of today’s search engines, which strive to comprehend the user intent behind search queries to deliver accurate, relevant results by connecting queries with entities. Many factors come into play when determining this relationship:
Matching Entities against Queries
Matching entities against queries is not merely about keyword matching, it involves understanding the nuances of language, context, and user intent. By examining the result score we can make conclusions about the factors of matching entities with queries.
- The higher the score the higher the relation between these two
- Popular searches will have higher result scores
- Well-developed knowledge graphs, the one Google has the most information from trusted sources, have much higher result scores compared to underdeveloped ones
Result Score for Entity retrieval for well-developed knowledge graphs
As previously stated, comprehensive knowledge graphs provide higher result scores. Google provides better search results by developing strong links between search queries and entities using these thorough graphs. Using a result score for retrieval provides a better user experience.
Brad Pitt, an American actor and film producer, has a result score of almost 30,000.
While Brad Pitt, an Australian boxer, has a low score of 210.
This is an indication to Google to serve actors’ knowledge graph when one is searching for this query, “Brad Pitt”, since it is more relevant to the user.
Relationship Strength between Different Entities based on the Number of Graph Links
Relationship strength between different entities can be gauged by analyzing the number of graph links they share. The quantity, and quality, of external links directed towards an entity, signifies its interconnectedness and influence within a network.
From the example, we see that Angelina is the closest Graph to Brad Pitt. This is based on the number of links going from Brad’s knowledge graph to Angelina’s knowledge graph and vice versa. A higher number of external links between two entities indicates a stronger relationship.
The Role of Page Rank
PageRank is a patented technology developed by Google that measures the popularity of a web page based on the number and quality of links pointing to it. The algorithm considers incoming links as votes, with pages receiving more high-quality links deemed more significant in search results. It’s based on:
- Number of links
- Domain reputation and topical authority – links from high authority sites in your field have more importance than just any links. Quality>Quantity
- Anchor Text
1. Anchor Text for Context
Anchor text helps to connect two entities because it provides semantic clues about the content of the destination page, assisting search engines in determining relevancy and ranking for specific keywords. In the context of the semantic web, anchor text helps Google understand the topic and context of linked pages, providing contextual clues about their content.
Whether it’s an exact match, semantic match, or long-tail anchor text, it helps connect entities. Anchor texts are especially important if they are in the form of semantic triples.
2. PageRank helps Knowledge Graphs improve over time
Aside from powering the internet and serving results based on backlinks and anchor text, it also provides support to improve the Knowledge graph. When high authority and topical sites point to one entity it provides more information about it and helps build a more complete knowledge graph. Anchor text complements KG since it is an indication of a semantic connection.
It makes the knowledge graphs constantly enriched and helps Google serve better results after each core update, making the user experience more profound.