TrawlerTrawlerPrivateFetchReplyData

TrawlerCrawling

GoogleApi.ContentWarehouse.V1.Model.TrawlerTrawlerPrivateFetchReplyData

9
out of 10
Critical
SEO Impact
This is an optional container of arbitrary data that can be added to a FetchReplyData. This data is meant to be logged, but not sent back in a fetch reply (it should be added after the reply is prepared). Use FetchResponsePreparatorImpl::AddTrawlerPrivateDataToFetchReplyData to add. See also the comment in fetch_response_preparator_impl.cc. Next Tag: 49

SEO Analysis

AI Generated

Part of Google's web crawling infrastructure (Trawler is Google's internal name for their web crawler). This model governs how Googlebot fetches and processes web pages, affecting crawl efficiency and frequency. Crawl management directly impacts how quickly new content is discovered and how often existing content is refreshed in the index.

Actionable Insights for SEOs

  • Monitor for changes in rankings that may correlate with updates to this system
  • Consider how your content strategy aligns with what this signal evaluates
  • Optimize crawl budget by fixing broken links and reducing redirect chains
  • Use robots.txt and sitemap.xml effectively to guide crawling
  • Monitor Google Search Console for crawl errors and indexing issues

Attributes

48
Sort:|Filter:
PostDataSizestring
Default: nilFull type: String.t

What's the post data size (in bytes) if it's a post request.

numDroppedRepliesstring
Default: nilFull type: String.t

Number of times we drop the content of a stream reply or the final reply, which can only be caused by REJECTED_NO_RPC_BUFFERS now.

HintIPAddressstring
Default: nilFull type: String.t

If we do not have Endpoints in FetchReplyData (e.g., url rejected due to hostload limit), do we have a guess of the server IPAddress (e.g., from robots fetch)? This helps us classify URLs based on country code, etc. The field is filled with IPAddress::ToPackedString().

RpcStartDeadlineLeftMsinteger(
Default: nil

RPC deadline left at the start of url control flow. Can be useful for debugging rpc deadline exceeded error received by clients, this field is only recorded if RpcEndDeadlineLeftMs is small enough.

largeStoreHitLocationstring
Default: nilFull type: String.t

Set to the hit location (CNS filename) if cache comes from large store.

isDedicatedHostloadboolean(
Default: nil
dependentFetchTypestring
Default: nilFull type: String.t

Dependent fetch type

isVpcTrafficboolean(
Default: nil

Set if the fetch goes through the virtual private cloud path so we can track the VPC traffic.

httpVersionstring
Default: nilFull type: String.t

Stores the HTTP version we used in the last hop.

BotGroupNamestring
Default: nilFull type: String.t

If we fetched using BotFetchAgent, what is the BotGroupName?

isBidiStreamingFetchboolean(
Default: nil

Whether this is a bidirectional streaming fetch.

authenticationInfostring
Default: nilFull type: String.t

Stores the OAuth authentication method.

RequestUserNamestring
Default: nilFull type: String.t

Log the loas username in trawler private to help with debugging. Store the username in trawler private so clients won't see it from FetchReply. To reduce disk usage, we only log the loas username if the requestorid being used does not have ClientUsernameRestrictions.

cacheHitTypestring
Default: nilFull type: String.t

Only set if the fetch uses cache content (is_cache_fetch is true).

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.TrawlerOriginalClientParams.t

Store the original client information.

IsRobotsFetchboolean(
Default: nil

Was this an internally-initiated robots.txt fetch?

resourceBucketstring
Default: nilFull type: String.t

If the requestor shares resource bucket with other requestorids, we will store the resource bucket name in these fields.

cacheAcceptableAgeinteger(
Default: nil

Corresponds to AcceptableAge field in FetchParams.

Producerstring
Default: nilFull type: String.t

Note TrawlerPrivateFetchReplyData is never sent back to clients. The following field is just for Trawler and Multiverse internal tracking, and clients should not look at this field at all.

ProxyInstancestring
Default: nilFull type: String.t

If set, this fetch was done through a proxy (e.g., fetchproxy).

cdnProviderstring
Default: nilFull type: String.t
concurrentStreamNumstring
Default: nilFull type: String.t

How many concurrent streams are on the connection when the request finishes (including this request). Export this value to monitor the stream multiplexing for HTTP/2.

cacheAcceptableAfterDateinteger(
Default: nil

Corresponds to AcceptableAfterDate field in FetchParams.

credentialIdstring
Default: nilFull type: String.t

Log the credential id

ResponseBytesstring
Default: nilFull type: String.t

The number of bytes we sent back to the client.

downloadFileNamestring
Default: nilFull type: String.t

If the response header contains Content-Disposition header "attachment; filename="google.zip": the download_file_name would be "google.zip"

isFloonetFetchboolean(
Default: nil

Whether or not this is a Floonet fetch request. Floonet requests have inherent lower availability (due to HOPE rejections when HOPE is in degraded mode, and other Floonet specific reasons). Therefore, it is important for debugging and for our availability SLO to know whether of not it is a floonet fetch. IMPORTANT NOTE: This field is only currently set for traffic that explicitly requires Floonet and can not failover to use Googlebot (i.e. "transparent" or "implicit" Floonet fetches).

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.TrawlerMultiverseClientIdentifier.t

Multiverse client information

TrawlerInstancestring
Default: nilFull type: String.t

Which Trawler cell was this response fetched in? (e.g. "HR" or "YQ")

HSTSHeaderValuestring
Default: nilFull type: String.t

HTTP Strict-Transport-Security (RFC6797) header value. We log this so we can generate a list of hosts that prefer HTTPS over HTTP.

tierstring
Default: nilFull type: String.t

Service tier info will be used in traffic grapher for ploting per tier graph.

Is5xxHostIdboolean(
Default: nil

Represents if the HostId belongs to HostId set in 5xx url patterns, it can work as a tag when emitting requestor minute summary, this helps us to aggregate traffic affected by 5xx patterns, and test if there are any fetching changes.

UserAgentSentstring
Default: nilFull type: String.t

The useragent string sent to the remote webserver. It corresponds to UserAgentToSend field in FetchParams.

googleExtendedObeyWildcardRobotsStatusinteger(
Default: nil

We check if Google-Extended is allowed to crawl this URL, wildcard rules are obeyed, this is for internal analysis. Check RobotsTxtClient::RobotsStatus for the meaning of number.

RobotsBodystring
Default: nilFull type: String.t

If this was a robots.txt fetch (IsRobotsFetch above), this may contain the robots.txt body. (It may not, for instance, 404s are omitted; current policy is URL_CRAWLED + partially crawled) This includes http headers + body.

UserAgentSentFpstring
Default: nilFull type: String.t

The fp2011 of useragent sent to the remote webserver, note it corresponds to UserAgentToSend field in FetchParams

prodRegionstring
Default: nilFull type: String.t

Log the prod region (only for regional harpoon requestor ids)

RpcEndDeadlineLeftMsinteger(
Default: nil

RPC deadline left at the end of url control flow. Can be useful for debugging rpc deadline exceeded error received by clients, this field is only recorded if it's small enough.

isFromGrpcProxyboolean(
Default: nil

Whether or not this response is sent from gRPC proxy service.

ServerSignaturestring
Default: nilFull type: String.t

An arbitrary string signature identifying the remote server type/version. In the case of HTTP, this would be the contents of the "Server:" header.

googleExtendedRobotsStatusinteger(
Default: nil

We check if Google-Extended is allowed to crawl this URL and store the result here, wildcard rules are not obeyed, this is for internal analysis. Check RobotsTxtClient::RobotsStatus for the meaning of number.

BotHostnamestring
Default: nilFull type: String.t

This is the HOPE server that we sent the url to. We log the HOPE backend cell and hope server shard number (e.g., 'qf:6'). This allows us to understand how we are balancing our load to the HOPE servers.

subResourceBucketstring
Default: nilFull type: String.t
Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.TrawlerLoggedVPCDestination.t

The following are vpc information that's only set if is_vpc_traffic is true.

bypassedHostOverfullboolean(
Default: nil

Cache hit for this url, bypassed host_overfull error.

CacheRequestorIDstring
Default: nilFull type: String.t

Present if the reply is from the trawler cache. This is the requestorid of the trawler client that populated the cache with the data we are reusing.

HadInMemCacheHitboolean(
Default: nil
FetcherTaskNumberinteger(
Default: nil

Which Trawler fetcher task fetched this URL.