SEO Theories
Be sure to read the SEO Theory blog, written by Michael Martinez.
Everyone develops a hypothesis or theory about the ways search engines work. The ideas proposed here can only be tested in a limited fashion. They will probably never be substantiated by the search engines.
Google's Supplemental Results
Check out a fuller discussion of Google's Supplemental Results.
I have on a couple of occasions asked Matt Cutts to explain what Google's supplemental results index is used for. He has understandably provided very oblique, relatively uninformative answers. Matt did say, however, that Google places fewer restrictions on the contents of the supplemental index. I suspect that means Google doesn't apply its filters to the supplemental index, or perhaps it only applies some filters (such as the adult content filter used for SafeSearch). Matt also mentioned that the supplemental index has its own crawler. The supplemental index is therefore built separately from the main index.
According to Matt's Datacenter Comments video from late August 2006, Google apparently redesigned its supplemental index architecture and software during the summer. Naturally, Matt did not explain what changes were made or why. However, I have noticed that many pages showing as "supplemental results" cannot be found with URL reference searches. That is, Google knows about those pages but it doesn't acknowledge any links to them. Nor can these pages be searched using the "site:" command to limit queries to just one URL.
I don't believe pages in the supplemental index have had any anchor text or PageRank assigned to them (Matt suggests this in a comment on his post Video: Datacenter comments). However, since these pages are being returned in normal query results, it appears that their contents are still being evaluated for relevance to user searches. It looks as though the supplemental pages are being evaluated without any linkage data being taken into consideration.
My present feeling is that pages with weak linkage (few trusted links) are more likely to be returned as supplemental results than pages with strong linkage (many trusted links). The main index is rebuilt more often than the supplemental index, according to Matt. If Google only builds the main index through vetted or "trusted" linkage, it will be less likely to reinclude pages that have few trusted links.
Why Ask has such a small index
The Ask.com search engine produces relatively few results for many commercial queries when compared with results from Yahoo!, MSN, and Google. Ask's current technology is named ExpertRank, but it appears to have evolved from Jon Kleinberg's Hypertext-Induced Topic Search (usually called HITS), which served as the basis for IBM's CLEVER search engine.
HITS identifies authorities (pages pointed to by many hubs) and hubs (pages that point to many authorities). Intuitively, HITS looks like a chicken-and-egg scenario: which does it find first, authorities or hubs? Most likely, a HITS implementation separates pages into three groups: those pages with the most inbound links, those pages with the most outbound links, and those pages that fall into neither of the first two categories. Some refinement must then occur, grouping hubs with authorities. HITS assumes that authoritative documents rarely link directly to each other, but that they do link to hubs.
CLEVER uses on-page content to refine the results of HITS. HITS scans on-page content to select its base set of pages before sorting them into authorities, hubs, and remaining pages. CLEVER most likely looks more closely at the relationships between the hubs and authorities. The best hubs provide more information about the authorities they point to. The best authorities provide more information about their topics and they point to good hubs. Intuitively, both good hubs and good authorities should have more content relevant to the query than poor hubs and authorities.
Ask's ExpertRank apparently attempts to compensate for HITS' inability to distinguish between multiple meanings for words. "Jaguar" is a popular example, as this word is associated with animals, automobiles, Aztec culture and mythology, and discussions about the limitations of HITS. Ask will divide its collection of hubs and authorities into groups (called communities) that appear to be related more closely to just one of the specific meanings of the word. A LocalRank/PageRank-like scoring is then used to determine which of the pages is the most important within each community. That is, the community members' linkage to each other is used to determine how they collectively respect each other.
In essence, Ask is really only interested in highly linked documents. If they are hubs, they must be recognized by authorities. If they are authorities, they must be recognized by hubs that Ask trusts. Ask crawls many more pages than it indexes because it needs to determine where the links are pointing. There must therefore be a much larger index against which Ask builds the query-resolving index. This structure may seem familiar to people who recall Inktomi's old 2-index system, where only the smaller index was used to satisfy queries.
How does Google determine trust?
Yahoo! invented TrustRank, not Google. It is highly unlikely that either Yahoo! or Google actually use TrustRank, as subsequent research has identified problems with TrustRank. The chief flaw with TrustRank, however, is that it is completely vulnerable to link brokerage.
In my paper On the Googleness of Being, originally published at Spider-food, I suggested that Google may recognize Trusted Content Sites on the basis of performance over time (no accrued penalties or filtration).
Matt Cutts has indicated on more than one occasion that Google strips pages of their ability to confer Reputation if they are caught selling links. Matt uses Reputation to refer to "PageRank and [link anchor text]". Link anchor text tells Google that page A thinks page B is relevant to specific words. SEOs and spammers have abused link anchor text extensively.
Perhaps Google allows pages to confer Trust through a viral process combined with a threshold. In order to confer trust, a page needs to link out to a preponderance of trusted or potentially trustworthy pages (that is, pages which have not been flagged as untrustworthy). This hypothesis supposes there must be three states of trust: Trusted, UnTrusted, and Unknown. Unknown pages can earn trust if they are pointed to by Trusted pages and if they point to Trusted pages. Hence, who you link to becomes at least as important as who links to you. If your page links mostly to Umknown or UnTrusted pages, it may may not become Trusted until many of those other pages become Trusted.
New paper: Who Does Google Trust Now?. September 9, 2006: A look at how Google may be determining trust for Web sites.
SEO Services Web site and all contents © Copyright 2006-2007 Michael L. Martinez. All rights reserved. Looking for our Google Competitive Strategy?
|