CompositeDocIndexingInfo

Composite DocIndexing

GoogleApi.ContentWarehouse.V1.Model.CompositeDocIndexingInfo

10
out of 10
Critical
SEO Impact
Contains information mostly used within indexing (e.g. not used for building the production serving shards). Most of this data is generated only in Alexandria, however there are exceptions.

SEO Analysis

AI Generated

Part of Google's Composite Document system, which brings together all known information about a URL into a single unified document representation. This includes content, links, quality signals, and metadata from multiple sources. The composite document is the complete picture Google has of a page and serves as the input for ranking algorithms.

Actionable Insights for SEOs

  • Monitor for changes in rankings that may correlate with updates to this system
  • Consider how your content strategy aligns with what this signal evaluates
  • Optimize crawl budget by fixing broken links and reducing redirect chains
  • Use robots.txt and sitemap.xml effectively to guide crawling
  • Monitor Google Search Console for crawl errors and indexing issues

Attributes

23
Sort:|Filter:
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingDocjoinerCDocBuildInfo.t

To hold extra info for building a final cdoc from raw cdoc and goldmine annotations.

contentProtectedboolean(
Default: nil

Whether current page is under content protection, i.e. a page has been crawled as an error page, but we preserve its last known good content and keep its crawl_status as converter.CrawlStatus::CONTENT.

convertToRobotedReasoninteger(
Default: nil

If set, indicates that the crawl status was converted to ROBOTED for the reason specified by the enum value in converter.RobotedReasons.ConvertToRobotedReasons. See indexing/converter/proto/converter.proto for details. If unset, then the document was not converted to roboted, and if the document crawl status is ROBOTED, then the document is disallowed (at least to Google) in robots.txt.

crawlStatusinteger(
Default: nil

One of the enum values in converter.CrawlStatus.State (see indexing/converter/proto/converter.proto for details). Default is converter.CrawlStatus::CONTENT. The document is roboted if the value is converter.CrawlStatus::ROBOTED.

demotionTagsstring
Default: nilFull type: list(String.t
errorTypeinteger(
Default: nil

One of the enum values in converter.ErrorPageType (see indexing/converter/proto/error-page-detector-enum.proto for detail). Default is converter::ERROR_PAGE_NONE.

freshdocsCorporastring
Default: nilFull type: list(String.t
hostidstring
Default: nilFull type: String.t

The host id of the document. Used chiefly to determine whether the document is part of a parked domain.

ieIdentifierstring
Default: nilFull type: String.t

A short descriptive string to help identify the IE application or setup where this CDoc is generated. For example: websearch_m3 This field is for debuggability purposes.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.ImageSearchImageIndexingInfo.t

Indexing info about images (i.e. image links missing image data, etc).

indexingTsstring
Default: nilFull type: String.t

The timestamp (the time since the Epoch, in microseconds) when the docjoin is exported from indexing. The main purpose of this field is to identify different versions of the same document.

noLongerCanonicalTimestampstring
Default: nilFull type: String.t

If set, the timestamp in microseconds when the URL stopped being canonical. This should never be set for exported canonical documents. This field is used by dups during canonical flip, and by webmain when doc selection switched between desktop and mobile. Union respects this timestamp to prevent old doc being deleted until the new doc is picked up

normalizedClickScorenumber(
Default: nil

This score is calculated by re-mapping the back onto the partition's score distribution, such that the score represents the score of the equivalently ranked organically-selected document.

primaryVerticalstring
Default: nilFull type: String.t

Vertical membership of the document. - primary_vertical is the vertical that initiated indexing of this document (or empty if the vertical was websearch). - verticals is the full list of verticals that contained this document (excluding websearch) at indexing time. primary_vertical may or may not be an element of verticals because of vertical membership skew between the ingestion time and indexing time. See go/one-indexing-for-web for more background.

rawNavboostinteger(
Default: nil

The raw navboost count for the canonical url without aggregating the navboost from dup urls. This field is used when building forwarding map.

rowTimestampstring
Default: nilFull type: String.t

The timestamp (the time since the Epoch, in microseconds) to represent doc version, which is used in the downstream processing after Raffia. If it's not set, indexing_ts will be used as row_timestamp. The timestamp is generally set by reprocessing to set slightly newer indexing_ts such that the system can respect the reprocessed version to overwrite old data in storage.

selectionTierRanknumber(
Default: nil

Selection tier rank is a language normalized score ranging from 0-1 over the serving tier (Base, Zeppelins, Landfills) for this document.

tracingIdstring
Default: nilFull type: list(String.t

The tracing ids is to label the version of url for url status tracking. This repeated field will carry at most 10 tracing id. See more details in go/rich-tracing-design There will be less than 2% base+uz cdocs carrying this field. The major sources of tracing ids include: Indexing API pushed urls Index Metrics sampling urls The tracing ids will be written into cdocs by Webmain Ramifier. The consumer of the tracing ids is Union serving notification collector see more at go/serving-notification-from-union

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.CrawlerChangerateUrlChangerate.t

Changerate information for this doc (see crawler/changerate/changerate.proto for details).

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.CrawlerChangerateUrlHistory.t

Url change history for this doc (see crawler/changerate/changerate.proto for details). Note if a doc has more than 20 changes, we only keep the last 20 changes here to avoid adding to much data in its docjoin.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingSignalAggregatorUrlPatternSignals.t

UrlPatternSignals for this doc, used to compute document score in LTG (see indexing/signal_aggregator/proto/signal-aggregator.proto for details).

verticalsstring
Default: nilFull type: list(String.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.ImageRepositoryVideoIndexingInfo.t

Indexing info about videos.