NlpSaftToken

NLPInfrastructure

GoogleApi.ContentWarehouse.V1.Model.NlpSaftToken

out of 10

Low

SEO Impact

A document token marks a span of bytes in the document text as a token or word. Next available index: 16.

SEO Analysis

AI Generated

Backend infrastructure with indirect SEO impact. This model (Nlp Saft Token) contains 15 attributes that define its data structure. Key functionality includes: Whether the break skipped over non-tag text (excluding script/style).

Actionable Insights for SEOs

Understanding this model helps SEOs grasp Google's internal data architecture
Consider how this system might interact with other ranking signals

Attributes

Sort:|Filter:

breakLevelstring

Default: nilFull type: String.t

breakSkippedTextboolean(

Default: nil

Whether the break skipped over non-tag text (excluding script/style).

categorystring

Default: nilFull type: String.t

Coarse-grained word category for token. See README.categories for category inventory.

endinteger(

Default: nil

headinteger(

Default: nil

Head of this token in the dependency tree: the id of the token which has an arc going to this one. If it is the root token of a sentence, then it is set to -1.

infoProto2BridgeMessageSet →

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t

Annotation for this token.

labelstring

Default: nilFull type: String.t

Label for dependency relation between this token and its head. See README.labels for label inventory.

lemmastring

Default: nilFull type: String.t

Word lemma. This is only filled if the lemma is different from the word form.

morphNlpSaftMorphology →

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.NlpSaftMorphology.t

Morphology information.

scriptCodestring

Default: nilFull type: String.t

A string representation (typically four letters, sometimes longer) of the token's Unicode script code, based on BCP 47/CLDR, capitalized according to ISO 15924. See i18n/identifiers/scriptcode.h for details.

startinteger(

Default: nil

[start, end] describe the inclusive byte range of the UTF-8 encoded token in document.text. End gives the index of the last byte, which may be a UTF-8 continuation byte, and the length in bytes is end - start + 1. begin/end options are for goldmine AnnotationsFinder to locate the offsets of saft tokens. Start is inclusive by default and end is marked.

tagstring

Default: nilFull type: String.t

Part-of-speech tag for token. See README.tags for tag inventory.

tagConfidencenumber(

Default: nil

Confidence score for the tag prediction -- should be interpreted as a probability estimate that the tag is correct.

textPropertiesinteger(

Default: nil

wordstring

Default: nilFull type: String.t

Token word form. This may not be identical to the original. For example, in goldmine annotation we do UTF-8 normalization and punctuation normalization. The punctuation normalization includes inferring the directionality of straight doublequotes -- that is, we map " to open quote (``) or close quote (''), and sometimes we get it wrong. SAFT processing in other contexts (such as queries in qrewrite) involves different normalizations.