GoodocWord
GoodocInfrastructureGoogleApi.ContentWarehouse.V1.Model.GoodocWord
SEO Analysis
AI GeneratedBackend infrastructure with indirect SEO impact. This model (Goodoc Word) contains SEO-relevant attributes including Penalty. Key functionality includes: The baseline's y-axis offset from the bottom of the word's bounding box, given in pixels. (A value of 2, for instance, indicates the baseline is 2px a...
Actionable Insights for SEOs
- Understanding this model helps SEOs grasp Google's internal data architecture
- Consider how this system might interact with other ranking signals
Attributes
16Baselineinteger(nilThe baseline's y-axis offset from the bottom of the word's bounding box, given in pixels. (A value of 2, for instance, indicates the baseline is 2px above the bottom of the box.)
nilFull type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.tCaplineinteger(nilThe capline is the y-axis offset from the top of the word bounding box. A positive value n indicates that capline is n-pixels above the top of this word.
CompactSymbolBoxesGoodocBoxPartitions →nilFull type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoxPartitions.tFor space efficiency, we sometimes skip the detailed per-symbol bounding boxes in Symbol.Box, and use this coarser representation instead, where we just store Symbol boundaries within the Word box. Most client code should not have to worry directly about this, it should be handled in the deepest layers of writing/reading goodocs (for example, see Compress() and Uncompress() in ocean/goodoc/goovols-bigtable-volume.h). Note(viresh): I experimented with this compression, and here are some numbers for reference. If the zlib-compressed page goodoc string size was 100 to start with, then this compaction makes it 65. As a possible future relaxation to consider: if we add in, for each symbol, a "top" and "bottom" box offset then the size would be 75 (that's with "repeated int32 top/bottom_offset" fields inside BoxPartitions, instead of inside each symbol).
Confidenceinteger(nilWord recognition confidence. Range depends upon OCR Engine.
IsFromDictionaryboolean(nilword. The meaning and range depends on the OCR engine or subsequent processing. Specifies whether the word was found
IsIdentifierboolean(nila number True if word represents
IsLastInSentenceboolean(nilTrue if the word is the last word in any sub-paragraph unit that functions at the same level of granularity as a sentence. Examples: "She hit the ball." (regular sentence) "Dewey defeats Truman" (heading) "The more, the merrier." (no verb) Note: not currently used. Code to set this was introduced in CL 7038338 and removed in OCL=10678722.
IsNumericboolean(nilin the dictionary True if the word represents
LabelGoodocLabel →nilFull type: GoogleApi.ContentWarehouse.V1.Model.GoodocLabel.tPenaltyinteger(nilPenalty for discordance of characters in a
RotatedBoxGoodocRotatedBoundingBox →nilFull type: GoogleApi.ContentWarehouse.V1.Model.GoodocRotatedBoundingBox.tIf RotatedBox is set, Box must be set as well. See RotatedBoundingBox.
SymbolGoodocSymbol →nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.GoodocSymbol.tWord characters, the text may
alternatesGoodocWordAlternates →nilFull type: GoogleApi.ContentWarehouse.V1.Model.GoodocWordAlternates.ttextstringnilFull type: String.tAs a shortcut, the content API provides the text of words instead of individual symbols (NOTE: this is experimental). This is UTF8. And the main font for the word is stored in Label.CharLabel.
writingDirectionstringnilFull type: String.tWriting direction for this word.