CompositeDoc

Composite DocDocument Processing

GoogleApi.ContentWarehouse.V1.Model.CompositeDoc

9
out of 10
Critical
SEO Impact
Protocol record used for collecting together all information about a document. Please consult go/dj-explorer for two basic questions about CompositeDoc: - Where should I look up certain information (e.g: pagerank, language)? - What does each field in CompositeDoc mean and who should I contact if I have questions? To add a new field into CompositeDoc, or change existing field's size significantly, please file a ticket at go/dj-new-field, fill in necessary information and get approved by docjoin-access@ team. Next id: 194

SEO Analysis

AI Generated

Part of Google's Composite Document system, which brings together all known information about a URL into a single unified document representation. This includes content, links, quality signals, and metadata from multiple sources. The composite document is the complete picture Google has of a page and serves as the input for ranking algorithms.

Actionable Insights for SEOs

  • Monitor for changes in rankings that may correlate with updates to this system
  • Consider how your content strategy aligns with what this signal evaluates

Attributes

44
Sort:|Filter:
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.CompositeDocLocalizedVariations.t
Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.IndexingConverterLocalizedAlternateName.t

Localized alternate names are similar to alternate names, except that it is associated with a language different from its canonical. This is the subset of webmaster-provided localized alternate names being in the dup cluster of this document. Used during serving for swapping in the URL based on regional and language preferences of the user.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.CompositeDocForwardingDup.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.PerDocData.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingPrivacyAccessAccessRequirements.t

Contains necessary information to enforce row level Docjoin access control.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.DocProperties.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingBadSSLCertificate.t

This field is present iff the page has a bad SSL certificate itself or in its redirect chain.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.CompositeDocExtraDup.t
subindexidstring
Default: nilFull type: list(String.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.PtokenPToken.t

Contains information necessary to perform policy decision on the usage of the data assosiated with this cdoc.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingConverterRichContentData.t

If present, indicates that some content was inserted, deleted, or replaced in the document's content (in CompositeDoc::doc::Content::Representation), and stores information about what was inserted, deleted, or replaced.

scaledIndyRankinteger(
Default: nil

to copy to per-doc

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.QualityProseCSEUrlInfo.t
indexingIntermediatestring
Default: nilFull type: String.t

Serialized indexing intermediate data.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.ImageRepositoryVideoProperties.t

Info about videos embedded in the document.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.CompositeDocIndexingInfo.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.LocalWWWInfo.t
storageRowTimestampMicrosstring
Default: nilFull type: String.t

Row timestamp in CDoc storage.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.CompositeDocPartialUpdateInfo.t

Only present in partial cdocs.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.CompositeDocAdditionalChecksums.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.Anchors.t

Mark as non-personal since no personal fields will be populated in anchors.link_additional_info and anchors.additional_info. For more details of Search personal data, see go/dma52-search-cdoc-fields.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.RegistrationInfo.t

Information about the most recent creation and expiration of this domain. It's extracted from domainedge signal.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t

A generic container to hold document annotations and signals. For a full list of extensions live today, see go/wde.

docinfoPassthroughAttachmentsProto2BridgeMessageSet →
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t

This message set is used for data pushed into the index using the signals framework that is never to be used in Mustang or TG Continuum scoring/snippeting code. Any protocol buffer stored in this message set is automatically returned in a docinfo response - it ends up in the "info" message set in the WWWSnippetResponse, so it can be used in post-doc twiddlers and for display in GWS with no code changes in Mustang or Teragoogle.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.ImageData.t

Info about "selected" images associated with the document for which we (already) have ImageData. For each image URL, some fixed number of documents are selected as web referrers for the image URL, and within those selected documents, we say the image is "selected". Within the remaining documents, we say the image is "rejected". Note that this distinction is slightly different from selected for indexing. Only images within doc_images where is_indexed_by_imagesearch is true will be selected for indexing. You can find the rejected images at composite_doc.doc_attachments().get(). You can find images that are selected, but for which we have no ImageData (yet) at composite_doc.image_indexing_info().selected_not_indexed_image_link()

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.CompositeDocIncludedContent.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.ClassifierPornDocumentData.t

Porn related data used for image and web search porn classification as well as for diagnostics purposes.

urldatestring
Default: nilFull type: String.t

Date in the url extracted by quality/snippets/urldate/date-in-url.cc This is given as midnight GMT on the date in question.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingEmbeddedContentEmbeddedContentInfo.t

Data produced by the embedded-content system. This is a thin message, containing only embedded_links_info data for the embedder and JavaScript/CSS embedded links (the embedded-content bigtable also contains snapshots, compressed document trees and all embedded link types). Provided using the index signal API.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.RichsnippetsPageMap.t

rich snippet extracted from the content of a document.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.CompositeDocQualitySignals.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingDocjoinerAnchorStatistics.t

Mark as non-personal since it's an aggregation of anchors. For more details of Search personal data, see go/dma52-search-cdoc-fields.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.CompositeDocAlternateName.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingDocjoinerDataVersion.t

Contains the tracking version of various data fields in CompositeDoc.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.QualityTimebasedSyntacticDate.t
urlstring
Default: nilFull type: String.t

WARNING!!! "url" field in CompositeDoc is optional, and is usually missing: e.g., Docjoin CompositeDoc's don't have CompositeDoc::url. has_url() checking is often useful. So don't rely on CompositeDoc::url unless you're sure otherwise. Usually you want to use CompositeDoc::doc::url instead.

docjoinsOnSpannerCommitTimestampMicrosstring
Default: nilFull type: String.t

The commit timestamp of a CDoc update to Docjoins on Spanner.

ContentChecksum96string
Default: nilFull type: String.t

Visible content checksum as computed by repository::parsehandler::checksum::Checksum96bitsParseHandler. The value is a Fprint96 in "key format" (i.e., by Fprint96::AsKey()).

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.CompositeDocLiveExperimentInfo.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.QualityLabelsGoogleLabelData.t

This field associates a document to particular labels and assigns confidence values to them.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.Sitemap.t

Sitelinks: a collection of interesting links a user might be interested in, given they are interested in this document. WARNING: this is different from the crawler Sitemaps (see SitemapsSignals in the attachments).

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.CompositeDocRobotsInfoList.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.GDocumentBase.t
csePagerankCutoffinteger(
Default: nil

URL should only be selected for CSE Index if it's pagerank is higher than cse_pagerank_cutoff.