GDocumentBase
GDocumentDocument ProcessingGoogleApi.ContentWarehouse.V1.Model.GDocumentBase
SEO Analysis
AI GeneratedHow Google processes and understands document structure. Affects how content is parsed and indexed. This model (G Document Base) contains SEO-relevant attributes including Pagerank, PagerankNS. Key functionality includes: Sometimes the URL displayed in search results should be different from what gets indexed (e.g. in enterprise, content management systems). If this val...
Actionable Insights for SEOs
- Monitor for changes in rankings that may correlate with updates to this system
- Consider how your content strategy aligns with what this signal evaluates
Attributes
30ContentExpiryTimeinteger(nilunix secs from epoch
DisplayUrlstringnilFull type: String.tSometimes the URL displayed in search results should be different from what gets indexed (e.g. in enterprise, content management systems). If this value is not set, we default to the regular URL.
DocIdstringnilFull type: String.t64-bit docid of the document (usually fingerprint of URL, but not always). WARNING: This does NOT uniquely identify a document ANYMORE. For a unique identifier across all documents in production please refer to the field 'id().key()' listed above.
ExternalFeedMetadatastringnilFull type: String.tExternalHttpMetadatastringnilFull type: String.tEnterprise-specific external metadata. See http://engdoc/eng/designdocs/enterprise/enterprise_indexing_metadata.html
FilterForSafeSearchinteger(nilDeprecated, do not use, this field is not populated since 2012.
IPAddrstringnilFull type: String.tIP addr in binary (allows for IPv6)
NoArchiveReasoninteger(nilNoFollowReasoninteger(nilNoImageIndexReasoninteger(nilNoImageframeOverlayReasoninteger(nilNoIndexReasoninteger(nilWhen these reasons are set to a non zero value, the document should not be indexed, or show a snippet, or show a cache, etc. These reasons are bit maps of indexing.converter.RobotsInfo.RobotedReasons enum values reflecting the places where the restriction was found: //depot/google3/indexing/converter/proto/converter.proto
NoPreviewReasoninteger(nilNoSnippetReasoninteger(nilNoTranslateReasoninteger(nilPagerankinteger(nilThis field is long-deprecated in favour of Pagerank_NS, it is no longer maintained and can break at any moment.
PagerankNSinteger(nilPagerank-NearestSeeds is a pagerank score for the doc, calculated using NearestSeeds method. This is the production PageRank value teams should use.
RepidstringnilFull type: String.tis the webmirror representative id of the canonical url. Urls with the same repid are considered as dups in webmirror. WARNING: use this field with caution! The webmirror duprules change frequently, so this value only reflects the duprules at the time when the canonical's docjoin is built.
ScienceMetadataScienceCitation →nilFull type: GoogleApi.ContentWarehouse.V1.Model.ScienceCitation.tCitation data for science articles.
URLstringnilFull type: String.tWARNING: the URL does NOT uniquely identify a document ANYMORE. For a unique identifier across all documents in production please refer to the field 'id().key()' listed above. Reason: foo.bar:/http and foo.bar:/http:SMARTPHONE share the same URL, but the body of the two documents might differ because of different crawl-context (desktop vs. smartphone in this example).
URLAfterRedirectsstringnilFull type: String.tURLEncodinginteger(nilSee webutil/urlencoding
contentGDocumentBaseContent →nilFull type: GoogleApi.ContentWarehouse.V1.Model.GDocumentBaseContent.tdirectoryGDocumentBaseDirectory →nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.GDocumentBaseDirectory.tecnFpstringnilFull type: String.t96-bit fingerprint of the canonical url's webmirror equivalence class name as of when this cdoc was exported.
nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingCrawlerIdServingDocumentIdentifier.tThe primary identifier of a production document is the document key given in the ServingDocumentIdentifier, which is the same as the row-key in Alexandria, and represents a URL and its crawling context. In your production code, please always assume that the document key is the only way to uniquely identify a document. ## Recommended way of reading: const string& doc_key = cdoc.doc().id().key(); ## CHECK(!doc_key.empty()); More background information can be found in google3/indexing/crawler_id/servingdocumentidentifier.proto The ServingDocumentIdentifier uniquely identifies a document in serving and also distinguishes between experimental vs. production documents. The SDI is also used as an input for the union/muppet key generation in serving.
localsearchDocInfoLocalsearchDocInfo →nilFull type: GoogleApi.ContentWarehouse.V1.Model.LocalsearchDocInfo.tLocalsearch-specific data.
oceanDocInfoOceanDocInfo →nilFull type: GoogleApi.ContentWarehouse.V1.Model.OceanDocInfo.tOcean-specific data.
originalcontentGDocumentBaseOriginalContent →nilFull type: GoogleApi.ContentWarehouse.V1.Model.GDocumentBaseOriginalContent.tuserAgentNamestringnilFull type: String.tThe user agent name used to crawl the URL. See //crawler/engine/webmirror_user_agents.h for the list of user-agents (e.g. crawler::WebmirrorUserAgents::kGoogleBot). NOTE: This field is copied from the first WEBMIRROR FetchReplyClientInfo in trawler_fetch_info column. We leave this field unpopulated if no WEBMIRROR FecthReplyClientInfo is found. As the submission of cl/51488336, Alexandria starts to populate this field. However, docjoins from freshdocs (or any other source), won't have this field populated, because we believe no one needs to read this field from freshdocs docjoins.