NlpSaftDocument

NLP SAFTInfrastructure

GoogleApi.ContentWarehouse.V1.Model.NlpSaftDocument

5
out of 10
Medium
SEO Impact
A document contains the raw text contents of the document as well as an analysis. The document can be split into tokens which can contain information about POS tags and dependency relations. The document can also contain entities and mentions of these entities in the document. Next available id: 36

SEO Analysis

AI Generated

Backend infrastructure with indirect SEO impact. This model (Nlp Saft Document) contains 33 attributes that define its data structure. Key functionality includes: Relations between entities in the document.

Actionable Insights for SEOs

  • Understanding this model helps SEOs grasp Google's internal data architecture
  • Consider how this system might interact with other ranking signals

Attributes

33
Sort:|Filter:
Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftRelation.t

Relations between entities in the document.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t

Generic annotations.

contentagestring
Default: nilFull type: String.t

Age of the content of the document. For details, see: quality/historical/shingle/signals/contentage.proto The format has been translated to a canonical timestamp (seconds since epoch).

bylineDatestring
Default: nilFull type: String.t

Document's byline date, if available: this is the date that will be shown in the snippets in web search results. It is stored as the number of seconds since epoch. See segindexer/compositedoc.proto

datestring
Default: nilFull type: String.t

Document anchor date in YYYYMMDDhhmmss format.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftEntity.t

Entities in the document.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftSemanticNode.t

The semantic nodes for the document represent arbitrary types of higher-level abstractions beyond entity mention coreference and binary relations between entities. These may include: n-ary relations, semantic frames or events. The semantic nodes for a document are the nodes in a directed acyclic graph, with an adjacency list representation.

lastSignificantUpdatestring
Default: nilFull type: String.t

Last significant update of the page content, in the same format as the contentage field, and also derived from ContentAge.last_significant_update in quality/historical/shingle/signals/contentage.proto.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftToken.t

Tokenization of the document.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftMeasure.t

Measures in the documents. This covers both time expressions as well as physical quantities.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftHyperlink.t

The hyperlinks in the document. Multiple hyperlinks are sorted in left-to-right order.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftAnnotatedPhrase.t

Annotated phrases in the document that are not semantically well-defined mentions of entities.

contentFirstseenstring
Default: nilFull type: String.t

Stores minimum of first time google successfully crawled a document, or indexed the document with contents (i.e, not roboted). It is stored as the number of seconds since epoch. See quality/historical/signals/firstseen/firstseen.proto

contentTypeinteger(
Default: nil

Optional document content_type (from webutil/http/content-type.proto). Used for setting the content_type when converting the SAFT Document to a CompositeDoc. Will be inferred if not given here.

entityLabelstring
Default: nilFull type: list(String.t

Entity labels used in this document. This field is used to define labels for the Entity::entity_type_probability field, which contains corresponding probabilities. WARNING: This field is deprecated. go/saft-replace-deprecated-entity-type

httpHeadersstring
Default: nilFull type: String.t

HTTP header for document. If the HTTP headers field is set it should be the complete header including the HTTP status line and the trailing cr/nl. HTTP headers are not required to be valid UTF-8. Per the HTTP/1.1 Syntax (RFC7230) standard, non-ASCII octets should be treated as opaque data.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftDocumentTopic.t
docidstring
Default: nilFull type: String.t

Identifier for document.

languageinteger(
Default: nil

Document language (default is English). This field's value maps cleanly to the i18n.languages.Language proto enum (i18n::languages::Language in C++).

textstring
Default: nilFull type: String.t

Raw text contents of document. (In docjoin attachments from the SAFT goldmine annotator this field will be empty.)

traceboolean(
Default: nil

Whether to enable component tracing during analysis of this document. See http://go/saft-tracing for details.

labeledSpansstring
Default: nilFull type: %{optional(String.t

Generic labeled spans (produced by the span labeling framework, go/saft-span-labeling). The map key identifies spans of the same type. By convention, it should be of the form "team_name/span_type_name".

goldenboolean(
Default: nil

Flag for indicating that the document is a gold-standard document. This can be used for putting additional weight on human-labeled documents in contrast to automatically labeled annotations.

focusEntityinteger(
Default: nil

Focus entity. For lexicon articles, like Wikipedia pages, a document is often about a certain entity. This is the local entity id of the focus entity for the document.

constituencyRootlist(integer(
Default: nil

The root node of the constituency tree for each sentence. If non-empty, the list of roots will be aligned with the sentences in the document. Note that some sentences may not have been parsed for various reasons; these sentences will be annotated with placeholder "stub parses". For details, see //nlp/saft/components/constituents/util/stub-parse.h.

authorstring
Default: nilFull type: list(String.t

Document author(s).

syntacticDatestring
Default: nilFull type: String.t

Document's syntactic date (e.g. date explicitly mentioned in the URL of the document or in the document title). It is stored as the number of seconds since epoch. See quality/timebased/syntacticdate/proto/syntactic-date.proto

urlstring
Default: nilFull type: String.t

Source document URL.

privacySensitiveboolean(
Default: nil

True if this document contains privacy sensitive data. When the document is transferred in RPC calls the RPC should use SSL_PRIVACY_AND_INTEGRITY security level.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftDocument.t

Sub-sections for document for dividing a document into volumes, parts, chapters, sections, etc.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftConstituencyNode.t

Constituency parse tree nodes for the sentences in this document.

rpcErrorboolean(
Default: nil

True if some RPC which touched this document had an error.

titlestring
Default: nilFull type: String.t

Optional document title.