GDocumentBase

GDocumentDocument Processing

GoogleApi.ContentWarehouse.V1.Model.GDocumentBase

7
out of 10
High
SEO Impact
Next id: 127

SEO Analysis

AI Generated

How Google processes and understands document structure. Affects how content is parsed and indexed. This model (G Document Base) contains SEO-relevant attributes including Pagerank, PagerankNS. Key functionality includes: Sometimes the URL displayed in search results should be different from what gets indexed (e.g. in enterprise, content management systems). If this val...

Actionable Insights for SEOs

  • Monitor for changes in rankings that may correlate with updates to this system
  • Consider how your content strategy aligns with what this signal evaluates

Attributes

30
Sort:|Filter:
ContentExpiryTimeinteger(
Default: nil

unix secs from epoch

DisplayUrlstring
Default: nilFull type: String.t

Sometimes the URL displayed in search results should be different from what gets indexed (e.g. in enterprise, content management systems). If this value is not set, we default to the regular URL.

DocIdstring
Default: nilFull type: String.t

64-bit docid of the document (usually fingerprint of URL, but not always). WARNING: This does NOT uniquely identify a document ANYMORE. For a unique identifier across all documents in production please refer to the field 'id().key()' listed above.

ExternalFeedMetadatastring
Default: nilFull type: String.t
ExternalHttpMetadatastring
Default: nilFull type: String.t

Enterprise-specific external metadata. See http://engdoc/eng/designdocs/enterprise/enterprise_indexing_metadata.html

FilterForSafeSearchinteger(
Default: nil

Deprecated, do not use, this field is not populated since 2012.

IPAddrstring
Default: nilFull type: String.t

IP addr in binary (allows for IPv6)

NoArchiveReasoninteger(
Default: nil
NoFollowReasoninteger(
Default: nil
NoImageIndexReasoninteger(
Default: nil
NoImageframeOverlayReasoninteger(
Default: nil
NoIndexReasoninteger(
Default: nil

When these reasons are set to a non zero value, the document should not be indexed, or show a snippet, or show a cache, etc. These reasons are bit maps of indexing.converter.RobotsInfo.RobotedReasons enum values reflecting the places where the restriction was found: //depot/google3/indexing/converter/proto/converter.proto

NoPreviewReasoninteger(
Default: nil
NoSnippetReasoninteger(
Default: nil
NoTranslateReasoninteger(
Default: nil
Pagerankinteger(
Default: nil

This field is long-deprecated in favour of Pagerank_NS, it is no longer maintained and can break at any moment.

PagerankNSinteger(
Default: nil

Pagerank-NearestSeeds is a pagerank score for the doc, calculated using NearestSeeds method. This is the production PageRank value teams should use.

Repidstring
Default: nilFull type: String.t

is the webmirror representative id of the canonical url. Urls with the same repid are considered as dups in webmirror. WARNING: use this field with caution! The webmirror duprules change frequently, so this value only reflects the duprules at the time when the canonical's docjoin is built.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.ScienceCitation.t

Citation data for science articles.

URLstring
Default: nilFull type: String.t

WARNING: the URL does NOT uniquely identify a document ANYMORE. For a unique identifier across all documents in production please refer to the field 'id().key()' listed above. Reason: foo.bar:/http and foo.bar:/http:SMARTPHONE share the same URL, but the body of the two documents might differ because of different crawl-context (desktop vs. smartphone in this example).

URLAfterRedirectsstring
Default: nilFull type: String.t
URLEncodinginteger(
Default: nil

See webutil/urlencoding

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.GDocumentBaseContent.t
Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.GDocumentBaseDirectory.t
ecnFpstring
Default: nilFull type: String.t

96-bit fingerprint of the canonical url's webmirror equivalence class name as of when this cdoc was exported.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingCrawlerIdServingDocumentIdentifier.t

The primary identifier of a production document is the document key given in the ServingDocumentIdentifier, which is the same as the row-key in Alexandria, and represents a URL and its crawling context. In your production code, please always assume that the document key is the only way to uniquely identify a document. ## Recommended way of reading: const string& doc_key = cdoc.doc().id().key(); ## CHECK(!doc_key.empty()); More background information can be found in google3/indexing/crawler_id/servingdocumentidentifier.proto The ServingDocumentIdentifier uniquely identifies a document in serving and also distinguishes between experimental vs. production documents. The SDI is also used as an input for the union/muppet key generation in serving.

localsearchDocInfoLocalsearchDocInfo →
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.LocalsearchDocInfo.t

Localsearch-specific data.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.OceanDocInfo.t

Ocean-specific data.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.GDocumentBaseOriginalContent.t
userAgentNamestring
Default: nilFull type: String.t

The user agent name used to crawl the URL. See //crawler/engine/webmirror_user_agents.h for the list of user-agents (e.g. crawler::WebmirrorUserAgents::kGoogleBot). NOTE: This field is copied from the first WEBMIRROR FetchReplyClientInfo in trawler_fetch_info column. We leave this field unpopulated if no WEBMIRROR FecthReplyClientInfo is found. As the submission of cl/51488336, Alexandria starts to populate this field. However, docjoins from freshdocs (or any other source), won't have this field populated, because we believe no one needs to read this field from freshdocs docjoins.