PerDocData
Per Doc DataPer-Document SignalGoogleApi.ContentWarehouse.V1.Model.PerDocData
SEO Analysis
AI GeneratedStores per-document data signals that are maintained for each indexed page. These document-level signals are core inputs to Google's ranking algorithms and may include quality scores, topical classifications, and other per-page assessments. Changes to these signals can directly affect a page's ranking potential.
Actionable Insights for SEOs
- Monitor for changes in rankings that may correlate with updates to this system
- Consider how your content strategy aligns with what this signal evaluates
Attributes
132scienceDoctypeinteger(nilScholar/Science Document type: <0 == not a Science Document -- default 0 == Science doc fully visible >0 == Science doc but limited visibility, the number is the visible terms
ScaledExptIndyRank2integer(nilexperimental
videoLanguageQualityVidyaVideoLanguageVideoLanguage →nilFull type: GoogleApi.ContentWarehouse.V1.Model.QualityVidyaVideoLanguageVideoLanguage.tAudio-based language classified by Automatic Language Identification (only for watch pages).
phildataPhilPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.PhilPerDocData.tuacSpamScoreinteger(nilThe uac spam score is represented in 7 bits, going from 0 to 127. Threshold is 64. Score >= 64 is considered as uac spam.
DEPRECATEDAuthorObfuscatedGaiastringnilFull type: list(String.tThe obfuscated google profile gaia id(s) of the author(s) of the document. This field is deprecated, use the string version.
spamtokensContentScorenumber(nilFor SpamTokens content scores. Used in SiteBoostTwiddler to determine whether a page is UGC Spam. See go/spamtokens-dd for details.
webrefEntitiesRepositoryWebrefWebrefMustangAttachment →nilFull type: GoogleApi.ContentWarehouse.V1.Model.RepositoryWebrefWebrefMustangAttachment.tWebRef entities associated to the document. See go/webref for details.
PremiumDataPremiumPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.PremiumPerDocData.tAdditional metadata for Premium document in the Google index.
spamMuppetSignalsSpamMuppetjoinsMuppetSignals →nilFull type: GoogleApi.ContentWarehouse.V1.Model.SpamMuppetjoinsMuppetSignals.tContains hacked site signals which will be used in query time joins. As of Oct'19, the field is stored in a separate corpus. It'll only be populated for in-flight requests between retrieve and full-score in perdocdata. So no extra storage is needed on muppet side.
knexAnnotationSocialPersonalizationKnexAnnotation →nilFull type: GoogleApi.ContentWarehouse.V1.Model.SocialPersonalizationKnexAnnotation.tFor indexing k'nex annotations for FreshDocs.
smartphoneDataSmartphonePerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.SmartphonePerDocData.tAdditional metadata for smartphone documents in the Google index.
semanticDateConfidenceinteger(nilDEPRECATED: semantic_date_confidence replaced by semantic_date_info.
trendspamScoreinteger(nilFor now, the count of matching trendspam queries.
ScaledSpamScoreYoraminteger(nilSpamscores are represented as a 7-bit integer, going from 0 to 127.
numUrlsinteger(nilTotal number of urls encoded in the url section = # of alternate urls + 1
datesInfostringnilFull type: String.tStores dates-related info (e.g. page is old based on its date annotations). Used in FreshnessTwiddler. Use encode/decode functions from quality/timebased/utils/dates-info-helper-inl.h
pagerank2number(nilnsrDataProtoQualityNsrNsrData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.QualityNsrNsrData.tStripped site-level signals, not present in the explicit nsr_* fields, nor compressed_quality_signals.
fringeQueryPriorQualityFringeFringeQueryPriorPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.QualityFringeFringeQueryPriorPerDocData.tContains encoded FringeQueryPrior information. Unlikely to be meaningful for anyone other than fringe-ranking team. Contact fringe-ranking team if any questions, but do NOT use directly without consulting them.
kaltixdataKaltixPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.KaltixPerDocData.tymylHealthScoreinteger(nilStores scores of ymyl health classifier as defined at go/ymyl-classifier-dd. To use this field, you MUST join g/pq-classifiers-announce and add your use case at http://shortn/_nfg9oAldou.
authorObfuscatedGaiaStrstringnilFull type: list(String.tlastSignificantUpdatestringnilFull type: String.tLast significant update of the document. This is sourced from the quality_timebased.LastSignificantUpdate proto as computed by the LSUSelector from various signals. The value is a UNIX timestamp in seconds.
spambrainDataSpamBrainData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.SpamBrainData.tHost-v1 sitechunk level scores coming from spambrain.
DEPRECATEDQuarantineWhitelistboolean(niltundraClusterIdinteger(nilThis field is propagated to shards. Stores clustering information on a site level for the Tundra project. This field is deprecated - used the equivalent field inside nsr_data_proto instead.
bodyWordsToTokensRatioTotalnumber(nilhomepagePagerankNsinteger(nilThe page-rank of the homepage of the site. Copied from the cdoc.doc().pagerank_ns() of the homepage.
topPetacatTaxIdinteger(nilTop petacat of the site. Used in SiteboostTwiddler to determine result/query matching.
OriginalContentScoreinteger(nilThe original content score is represented as a 7-bits, going from 0 to 127. Only pages with little content have this field. The actual original content score ranges from 0 to 512. It is encoded with quality_q2::OriginalContentUtil::EncodeOriginalContentScore(). To decode the value, use quality_q2::OriginalContentUtil::DecodeOriginalContentScore().
contentAttributionsContentAttributions →nilFull type: GoogleApi.ContentWarehouse.V1.Model.ContentAttributions.twebmirrorEcnFpstringnilFull type: String.tDocLevelSpamScoreinteger(nilThe document spam score is represented as a 7-bits, going from 0 to 127.
urlPoisoningDataUrlPoisoningData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.UrlPoisoningData.tContains url poisoning data for suppressing spam documents.
EventPerDocDebugEvent →nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.PerDocDebugEvent.tFree form debug info. NB2: consider carefully what to save here. It's easy to eat lots of gfs space with debug info that nobody needs...
mediaOrPeopleEntitiesImageQualitySensitiveMediaOrPeopleEntities →nilFull type: GoogleApi.ContentWarehouse.V1.Model.ImageQualitySensitiveMediaOrPeopleEntities.tContains the mids of the 5 most topical entities annotated with selected KG collections. This information is currently used on Image Search to detect cases where results converged to mostly a single person or media entity. More details: go/result-set-convergence.
scaledSelectionTierRankinteger(nilSelection tier rank is a language normalized score ranging from 0-32767 over the serving tier (Base, Zeppelins, Landfills) for this document. This is converted back to fractional position within the index tier by scaled_selection_tier_rank/32767.
pageTagslist(integer(nilsmearingMaxTotalOffdomainAnchorsinteger(nilpageranknumber(nilExperimental pageranks (DEPRECATED; only pagerank in MustangBasicInfo is used).
QuarantineInfointeger(nilbitmask of QuarantineBits (or'd together) used to store quarantine related information. For example: QUARANTINE_WHITELIST | QUARANTINE_URLINURL.
rosettaLanguagesstringnilFull type: list(String.tTop two document language BCP-47 codes as generated by the RosettaLanguageAnnotator in the decreasing order of probability.
freshnessEncodedSignalsstringnilFull type: String.tStores freshness and aging related data, such as time-related quality metrics predicted from url-pattern level signals. Use the encoding decoding API in quality/freshness/docclassifier/aging/encoded-pattern-signals.h This field is deprecated.
imagedataImagePerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.ImagePerDocData.tvideoCorpusDocidstringnilFull type: String.tqueriesForWhichOfficialOfficialPagesQuerySet →nilFull type: GoogleApi.ContentWarehouse.V1.Model.OfficialPagesQuerySet.tThe set of (query, country, language) triples for which this document is considered to be the official page. For example, www.britneyspears.com would be official for ("britney spears", "us", 0) and others (0 is English).
nsrIsCovidLocalAuthorityboolean(nilThis field is propagated to shards. In addition, it is populated at serving time by go/web-signal-joins. This field is deprecated - used the equivalent field inside nsr_data_proto instead.
crawlerIdProtoLogsProtoIndexingCrawlerIdCrawlerIdProto →nilFull type: GoogleApi.ContentWarehouse.V1.Model.LogsProtoIndexingCrawlerIdCrawlerIdProto.tFor crawler-ID variations, the crawling context applied to the document. See go/url, and the description in google3/indexing/crawler_id
ScaledSpamScoreEricinteger(nilbiasingdataBiasingPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.BiasingPerDocData.tScaledExptSpamScoreEricinteger(nilv2KnexAnnotationQualitySherlockKnexAnnotation →nilFull type: GoogleApi.ContentWarehouse.V1.Model.QualitySherlockKnexAnnotation.tFor indexing v2 k'nex, see/go/knex-v2-doc-annotation for details.
MobileDataMobilePerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.MobilePerDocData.tAdditional metadata for lowend mobile documents in the Google index.
BookCitationDataBookCitationPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.BookCitationPerDocData.tthe book citation data for each web page, the average size is about 10 bytes
semanticDateinteger(nilSemanticDate, estimated date of the content of a document based on the contents of the document (via parsing), anchors and related documents. Date is encoded as a 32-bits UNIX date (1970 Jan 1 epoch). Confidence is encoded using a SemanticDate specific format. For details of encoding, please refer to quality/freshness/docclassifier/semanticdate/public/semantic_date.proto
biasingdata2BiasingPerDocData2 →nilFull type: GoogleApi.ContentWarehouse.V1.Model.BiasingPerDocData2.tA replacement for BiasingPerDocData that is more space efficient. Once this is live everywhere, biasingdata will be deprecated.
ymylNewsScoreinteger(nilStores scores of ymyl news classifier as defined at go/ymyl-classifier-dd. To use this field, you MUST join g/pq-classifiers-announce and add your use case at http://shortn/_nfg9oAldou.
saftLanguageIntlist(integer(nilTop document language as generated by SAFT LangID. For now we store bare minimum: just the top 1 language value, converted to the language enum, and only when different from the first value in 'languages'.
nilFull type: GoogleApi.ContentWarehouse.V1.Model.RepositoryAnnotationsRdfaRdfaRichSnippetsApplication.tApplication information associated to the document.
domainAgeinteger(nil16-bit
lastSignificantUpdateInfostringnilFull type: String.tMetadata about last significant update. Currently this only encodes the quality_timebased.LastSignificantUpdate.source field which contains the info on the source of the signal. NOTE: Please do not read the value directly. Use helpers from quality/timebased/lastsignificantupdate/lsu-helper.h instead.
pagerank1number(nilspamCookbookActionSpamCookbookAction →nilFull type: GoogleApi.ContentWarehouse.V1.Model.SpamCookbookAction.tActions based on Cookbook recipes that match the page.
compressedUrlstringnilFull type: String.tCompressed URL string used for SETI.
extraDataProto2BridgeMessageSet →nilFull type: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.tThis field is available only in the docjoins: it is cleared before building per-doc data in both Mustang and Teragoogle. (MessageSet is inefficient in space for serving data) Use this for all new fields that aren't needed during serving. Currently this field contains: UrlSignals for the document level spam classifier (when the doclevelspamscore is set). PerDocLangidData and realtimespam::ClassifierResult for the document level fresh spam classifier (when the doc-level fresh spam score is generated). MicroblogDocQualitySignals for document-level microblog spam classifier. This only exists in Firebird for now. spam_buckets::BucketsData for a document-structure hash This field is non-personal since the personal fields in MessageSet are not populated in production.
socialgraphNodeNameFpstringnilFull type: String.tFor Social Search we store the fingerprint of the SG node name. This is used in one of the superroot's PRE_DOC twiddlers as a lookup key for the full Social Search data. PRE_DOC = twiddlers firing before the DocInfo request is sent to the mustang backend.
urlAfterRedirectsFpstringnilFull type: String.tThese two fingerprints are used for de-duping results in a twiddler. They should only be populated by freshdocs, and will only be present for documents that are chosen to be canonicals in a cluster whose previous canonical is also in the index. Additionally, url_after_redirects_fp is only present if it is different from a fingerprint of the URL.
localizedClusterIndexingDupsLocalizedLocalizedCluster →nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingDupsLocalizedLocalizedCluster.tInformation on localized clusters, which is the relationship of translated and/or localized pages.
pageregionsstringnilFull type: String.tString that encodes the position ranges for different regions of the document. See "indexer/pageregion.h" for an explanation, and how to decode the string
KeywordStuffingScoreinteger(nilThe keyword stuffing score is represented in 7 bits, going from 0 to 127.
spambrainTotalDocSpamScorenumber(nilThe document total spam score identified by spambrain, going from 0 to 1.
noimageframeoverlayreasoninteger(nilIf not 0, we should not show the image in overlay mode in image snippets
scienceHoldingsIdsstringnilFull type: list(String.tDeprecated 2016/01/14.
crawlPagerankinteger(nilThis field is used internally by the docjoiner to forward the crawl pageranks from original canonicals to canonicals we actually chose; outside sources should not set it, and it should not be present in actual docjoins or the index.
BlogDataBlogPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.BlogPerDocData.tnsrIsVideoFocusedSiteboolean(nilThis field is propagated to shards. It will also be populated at serving time by go/web-signal-joins (see b/170607253). Bit indicating whether this site is video-focused, but not hosted on any major known video hosting domains. This field is deprecated - used the equivalent field inside nsr_data_proto instead.
ScaledExptSpamScoreYoraminteger(nilspamrankinteger(nilThe spamrank measures the likelihood that this document links to known spammers. Its value is between 0 and 65535.
compressedQualitySignalsCompressedQualitySignals →nilFull type: GoogleApi.ContentWarehouse.V1.Model.CompressedQualitySignals.tvideodataVideoPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.VideoPerDocData.ts3AudioLanguageS3AudioLanguageS3AudioLanguage →nilFull type: GoogleApi.ContentWarehouse.V1.Model.S3AudioLanguageS3AudioLanguage.tPrimary video's audio language classified by S3 based Automatic Language Identification (only for watch pages).
watchpageLanguageResultWatchpageLanguageWatchPageLanguageResult →nilFull type: GoogleApi.ContentWarehouse.V1.Model.WatchpageLanguageWatchPageLanguageResult.tLanguage classified by the WatchPageLanguage Model (go/watchpage-language). Only present for watch pages.
appsLinkQualityCalypsoAppsLink →nilFull type: GoogleApi.ContentWarehouse.V1.Model.QualityCalypsoAppsLink.tAppsLink contains Android application IDs in outlinks. It is used to improve results ranking within applications universal. See http://go/apps-universal for the project details.
desktopInterstitialsIndexingMobileInterstitialsProtoDesktopInterstitials →nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingMobileInterstitialsProtoDesktopInterstitials.tContains desktop interstitials signal for VOLT ranking change.
liveResultsDataWeboftrustLiveResultsDocAttachments →nilFull type: GoogleApi.ContentWarehouse.V1.Model.WeboftrustLiveResultsDocAttachments.tcrowdingdataCrowdingPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.CrowdingPerDocData.tnsrSitechunkstringnilFull type: String.tSiteChunk computed for nsr. It some cases it can use more information than just url (e.g. youtube channels). See NsrAnnotator for details. If sitechunk is longer than --populate_nsr_sitechunk_max_length (default=100), it will not get populated. This field might be compressed and needs to be decoded with quality_nsr::util::DecodeNsrSitechunk. See go/nsr-chunks for more details. This field contains only nontrivial primary chunks.
originalTitleHardTokenCountinteger(nilThe number of hard tokens in the title.
hostAgeinteger(nilThe earliest firstseen date of all pages in this host/domain. These data are used in twiddler to sandbox fresh spam in serving time. It is 16 bit and the time is day number after 2005-12-31, and all the previous time are set to 0. If this url's host_age == domain_age, then omit domain_age Please use //spam/content/siteage-util.h to convert the day between epoch second. Regarding usage of Sentinel values: We would like to check if a value exists in scoring bundle while using in Ranklab AST. For this having a sentinel value will help us know if the field exists or has a sentinel value (in the case it does not exist). 16-bit
inNewsstandboolean(nilThis field indicates whether the document is in the newsstand corpus.
origininteger(nilnilFull type: GoogleApi.ContentWarehouse.V1.Model.QualityRichsnippetsAppsProtosLaunchAppInfoPerDocData.tInfo on how to launch a mobile app to consume this document's content, if applicable (see go/calypso).
eventsDatestringnilFull type: list(String.tDate for Events. A web page might list multiple events with different dates. We only take one date (start date) per event.
homePageInfointeger(nilGibberishScoreinteger(nilThe gibberish score is represented in 7 bits, going from 0 to 127.
toolbarPagerankinteger(nilA copy of the value stored in /namespace/indexing/wwwglobal//fakepr/* for this document. A value of quality_bakery::FakeprUtils::kUnknownToolbarPagerank indicates that we don't have toolbar pagerank for this document. A value between 0 and 10 (inclusive) means that this is the toolbar pagerank of the page. Finally, if this value is not set it means that the toolbar pagerank is equivalent to: quality_bakery::FakeprUtils::EstimatePreDemotionFromPagerankNearestSeeds( basic_info.pagerank_ns()) called on the MustangBasicInfo attachment for the same document.
freshboxArticleScoresinteger(nilStores scores of freshness-related classifiers: freshbox article score, live blog score and host-level article score. The encoding/decoding API is in quality/freshness/freshbox/goldmine/freshbox_annotation_encoder.h. To use this field, you MUST join g/pq-classifiers-announce and add your use case at http://shortn/_RYXS2lX2IV.
WhirlpoolDiscountnumber(nilScaledExptIndyRank3integer(nilexperimental
ToolBarDataToolBarPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.ToolBarPerDocData.tnsrIsElectionAuthorityboolean(nilThis field is propagated to shards. It will also be populated at serving time by go/web-signal-joins (see b/168114815). This field is deprecated - used the equivalent field inside nsr_data_proto instead.
onsiteProminenceinteger(nilOnsite prominence measures the importance of the document within its site. It is computed by propagating simulated traffic from the homepage and high craps click pages. It is a 13-bit int.
travelGoodSitesInfoQualityTravelGoodSitesData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.QualityTravelGoodSitesData.tThis field stores information about good travel sites.
IsAnchorBayesSpamboolean(nilIs this document considered spam by the anchor bayes classifier?
isHotdocboolean(nilSet by the FreshDocs instant doc joiner. See //indexing/instant/hotdocs/README and http://go/freshdocs-hotdocs.
commercialScorenumber(nilA measure of commerciality of the document Score > 0 indicates document is commercial (i.e. sells something) Computed by repository/pageclassifiers/parsehandler-commercial.cc
asteroidBeltIntentsQualityOrbitAsteroidBeltDocumentIntentScores →nilFull type: GoogleApi.ContentWarehouse.V1.Model.QualityOrbitAsteroidBeltDocumentIntentScores.tFor indexing Asteroid Belt intent scores. See go/asteroid-belt for details.
TagPageScoreinteger(nilTag-site-ness of a page, repesented in 7-bits range from 0 to 100. Smaller value means worse tag page.
geodatastringnilFull type: String.tgeo data; approx 24 bytes for 23M U.S. pages
oceandataOceanPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.OceanPerDocData.t28 bytes per page, only in the Ocean index
pagerank0number(nilSpamWordScoreinteger(nilThe spamword score is represented in 7-bits, going from 0 to 127.
ScaledIndyRankinteger(nilThe independence rank is represented as a 16-bit integer, which is multiplied by (max_indy_rank / 65536) to produce actual independence rank values. max_indy_rank is typically 0.84.
bodyWordsToTokensRatioBeginnumber(nilThe body words over tokens ratios for the beginning part and whole doc. NB: To save space, field body_words_to_tokens_ratio_total is not set if it has the same value as body_words_to_tokens_ratio_begin (e.g., short docs).
topPetacatWeightnumber(nilfireflySiteSignalQualityCopiaFireflySiteSignal →nilFull type: GoogleApi.ContentWarehouse.V1.Model.QualityCopiaFireflySiteSignal.tContains Site signal information for Firefly ranking change. See http://ariane/313938 for more details.
titleHardTokenCountWithoutStopwordsinteger(nilNumber of hard tokens originally in title without counting the stopwords.
hostNsrinteger(nilSite rank computed for host-level sitechunks. This value encodes nsr, site_pr and new_nsr. See quality_nsr::util::ConvertNsrDataToHostNsr and go/nsr. This field is deprecated - used the equivalent field inside nsr_data_proto instead.
semanticDateInfointeger(nilInfo is encoded using a SemanticDate specific format. Contains confidence scores for day/month/year components as well as various meta data required by the freshness twiddlers.
languageslist(integer(nilPlausible languages in order of decreasing plausibility. Language values are small, IE < 127 so this should compress to one byte each.
GroupsDataGroupsPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.GroupsPerDocData.t16 bytes of groups2 data: used only in groups2 index
countryInfoCountryCountryAttachment →nilFull type: GoogleApi.ContentWarehouse.V1.Model.CountryCountryAttachment.tThis field stores the country information for the document in the form of CountryAttachment.
nilFull type: GoogleApi.ContentWarehouse.V1.Model.QualityGeoBrainlocBrainlocAttachment.tBrainloc contains location information for the document. See ariane/273189 for details.
ScaledLinkAgeSpamScoreinteger(nilEnd DEPRECATED ------------------------------------------------------------ Link age score is represented as a 7-bit integer, going from 0 to 127.
ScaledExptIndyRankinteger(nilDEPRECATED ---------------------------------------------------------------- Please do not use these fields in any new code. experimental
shingleInfoShingleInfoPerDocData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.ShingleInfoPerDocData.tproductSitesInfoQualityProductProductSiteData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.QualityProductProductSiteData.tThis field stores information about product sites.
spambrainDomainSitechunkDataSpamBrainData →nilFull type: GoogleApi.ContentWarehouse.V1.Model.SpamBrainData.tDomain sitechunk level scores coming from spambrain.
nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingMobileVoltVoltPerDocData.tContains page UX signals for VOLT ranking change. See http://ariane/4025970 for more details.
timeSensitivityinteger(nilEncoded Document Time Sensitivity signal.
servingTimeClusterIdsIndexingDocjoinerServingTimeClusterIds →nilFull type: GoogleApi.ContentWarehouse.V1.Model.IndexingDocjoinerServingTimeClusterIds.tA set of cluster ids which are generated in Alexandria and used to de-dup results at serving time.