NlpSaftDocument
NLP SAFTInfrastructureGoogleApi.ContentWarehouse.V1.Model.NlpSaftDocument
SEO Analysis
AI GeneratedBackend infrastructure with indirect SEO impact. This model (Nlp Saft Document) contains 33 attributes that define its data structure. Key functionality includes: Relations between entities in the document.
Actionable Insights for SEOs
- Understanding this model helps SEOs grasp Google's internal data architecture
- Consider how this system might interact with other ranking signals
Attributes
33relationNlpSaftRelation →nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftRelation.tRelations between entities in the document.
annotationsProto2BridgeMessageSet →nilFull type: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.tGeneric annotations.
contentagestringnilFull type: String.tAge of the content of the document. For details, see: quality/historical/shingle/signals/contentage.proto The format has been translated to a canonical timestamp (seconds since epoch).
bylineDatestringnilFull type: String.tDocument's byline date, if available: this is the date that will be shown in the snippets in web search results. It is stored as the number of seconds since epoch. See segindexer/compositedoc.proto
datestringnilFull type: String.tDocument anchor date in YYYYMMDDhhmmss format.
entityNlpSaftEntity →nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftEntity.tEntities in the document.
semanticNodeNlpSaftSemanticNode →nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftSemanticNode.tThe semantic nodes for the document represent arbitrary types of higher-level abstractions beyond entity mention coreference and binary relations between entities. These may include: n-ary relations, semantic frames or events. The semantic nodes for a document are the nodes in a directed acyclic graph, with an adjacency list representation.
lastSignificantUpdatestringnilFull type: String.tLast significant update of the page content, in the same format as the contentage field, and also derived from ContentAge.last_significant_update in quality/historical/shingle/signals/contentage.proto.
tokenNlpSaftToken →nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftToken.tTokenization of the document.
measureNlpSaftMeasure →nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftMeasure.tMeasures in the documents. This covers both time expressions as well as physical quantities.
hyperlinkNlpSaftHyperlink →nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftHyperlink.tThe hyperlinks in the document. Multiple hyperlinks are sorted in left-to-right order.
annotatedPhraseNlpSaftAnnotatedPhrase →nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftAnnotatedPhrase.tAnnotated phrases in the document that are not semantically well-defined mentions of entities.
contentFirstseenstringnilFull type: String.tStores minimum of first time google successfully crawled a document, or indexed the document with contents (i.e, not roboted). It is stored as the number of seconds since epoch. See quality/historical/signals/firstseen/firstseen.proto
contentTypeinteger(nilOptional document content_type (from webutil/http/content-type.proto). Used for setting the content_type when converting the SAFT Document to a CompositeDoc. Will be inferred if not given here.
entityLabelstringnilFull type: list(String.tEntity labels used in this document. This field is used to define labels for the Entity::entity_type_probability field, which contains corresponding probabilities. WARNING: This field is deprecated. go/saft-replace-deprecated-entity-type
httpHeadersstringnilFull type: String.tHTTP header for document. If the HTTP headers field is set it should be the complete header including the HTTP status line and the trailing cr/nl. HTTP headers are not required to be valid UTF-8. Per the HTTP/1.1 Syntax (RFC7230) standard, non-ASCII octets should be treated as opaque data.
nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftDocumentTopic.tdocidstringnilFull type: String.tIdentifier for document.
languageinteger(nilDocument language (default is English). This field's value maps cleanly to the i18n.languages.Language proto enum (i18n::languages::Language in C++).
textstringnilFull type: String.tRaw text contents of document. (In docjoin attachments from the SAFT goldmine annotator this field will be empty.)
traceboolean(nilWhether to enable component tracing during analysis of this document. See http://go/saft-tracing for details.
labeledSpansstringnilFull type: %{optional(String.tGeneric labeled spans (produced by the span labeling framework, go/saft-span-labeling). The map key identifies spans of the same type. By convention, it should be of the form "team_name/span_type_name".
goldenboolean(nilFlag for indicating that the document is a gold-standard document. This can be used for putting additional weight on human-labeled documents in contrast to automatically labeled annotations.
focusEntityinteger(nilFocus entity. For lexicon articles, like Wikipedia pages, a document is often about a certain entity. This is the local entity id of the focus entity for the document.
constituencyRootlist(integer(nilThe root node of the constituency tree for each sentence. If non-empty, the list of roots will be aligned with the sentences in the document. Note that some sentences may not have been parsed for various reasons; these sentences will be annotated with placeholder "stub parses". For details, see //nlp/saft/components/constituents/util/stub-parse.h.
authorstringnilFull type: list(String.tDocument author(s).
syntacticDatestringnilFull type: String.tDocument's syntactic date (e.g. date explicitly mentioned in the URL of the document or in the document title). It is stored as the number of seconds since epoch. See quality/timebased/syntacticdate/proto/syntactic-date.proto
urlstringnilFull type: String.tSource document URL.
privacySensitiveboolean(nilTrue if this document contains privacy sensitive data. When the document is transferred in RPC calls the RPC should use SSL_PRIVACY_AND_INTEGRITY security level.
subsectionNlpSaftDocument →nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftDocument.tSub-sections for document for dividing a document into volumes, parts, chapters, sections, etc.
constituencyNodeNlpSaftConstituencyNode →nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.NlpSaftConstituencyNode.tConstituency parse tree nodes for the sentences in this document.
rpcErrorboolean(nilTrue if some RPC which touched this document had an error.
titlestringnilFull type: String.tOptional document title.