GoodocWord

GoodocInfrastructure

GoogleApi.ContentWarehouse.V1.Model.GoodocWord

3
out of 10
Low
SEO Impact
A word representation

SEO Analysis

AI Generated

Backend infrastructure with indirect SEO impact. This model (Goodoc Word) contains SEO-relevant attributes including Penalty. Key functionality includes: The baseline's y-axis offset from the bottom of the word's bounding box, given in pixels. (A value of 2, for instance, indicates the baseline is 2px a...

Actionable Insights for SEOs

  • Understanding this model helps SEOs grasp Google's internal data architecture
  • Consider how this system might interact with other ranking signals

Attributes

16
Sort:|Filter:
Baselineinteger(
Default: nil

The baseline's y-axis offset from the bottom of the word's bounding box, given in pixels. (A value of 2, for instance, indicates the baseline is 2px above the bottom of the box.)

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t
Caplineinteger(
Default: nil

The capline is the y-axis offset from the top of the word bounding box. A positive value n indicates that capline is n-pixels above the top of this word.

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoxPartitions.t

For space efficiency, we sometimes skip the detailed per-symbol bounding boxes in Symbol.Box, and use this coarser representation instead, where we just store Symbol boundaries within the Word box. Most client code should not have to worry directly about this, it should be handled in the deepest layers of writing/reading goodocs (for example, see Compress() and Uncompress() in ocean/goodoc/goovols-bigtable-volume.h). Note(viresh): I experimented with this compression, and here are some numbers for reference. If the zlib-compressed page goodoc string size was 100 to start with, then this compaction makes it 65. As a possible future relaxation to consider: if we add in, for each symbol, a "top" and "bottom" box offset then the size would be 75 (that's with "repeated int32 top/bottom_offset" fields inside BoxPartitions, instead of inside each symbol).

Confidenceinteger(
Default: nil

Word recognition confidence. Range depends upon OCR Engine.

IsFromDictionaryboolean(
Default: nil

word. The meaning and range depends on the OCR engine or subsequent processing. Specifies whether the word was found

IsIdentifierboolean(
Default: nil

a number True if word represents

IsLastInSentenceboolean(
Default: nil

True if the word is the last word in any sub-paragraph unit that functions at the same level of granularity as a sentence. Examples: "She hit the ball." (regular sentence) "Dewey defeats Truman" (heading) "The more, the merrier." (no verb) Note: not currently used. Code to set this was introduced in CL 7038338 and removed in OCL=10678722.

IsNumericboolean(
Default: nil

in the dictionary True if the word represents

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.GoodocLabel.t
Penaltyinteger(
Default: nil

Penalty for discordance of characters in a

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.GoodocRotatedBoundingBox.t

If RotatedBox is set, Box must be set as well. See RotatedBoundingBox.

Default: nilFull type: list(GoogleApi.ContentWarehouse.V1.Model.GoodocSymbol.t

Word characters, the text may

Default: nilFull type: GoogleApi.ContentWarehouse.V1.Model.GoodocWordAlternates.t
textstring
Default: nilFull type: String.t

As a shortcut, the content API provides the text of words instead of individual symbols (NOTE: this is experimental). This is UTF8. And the main font for the word is stored in Label.CharLabel.

writingDirectionstring
Default: nilFull type: String.t

Writing direction for this word.