Guide: JSON Schema
Learn how to build a schema for your unstructured data
Atlos uses a sub-set of the JSON schema to extract data from unstructured data.
Basics
A JSON schema is made up of Fields
.
A field has the following components:
name
: The name of the fieldtype
: The type of the fielddescription
: A description of the field
⚠️ Important: Only these three components are supported. Additional components will fail.
Field Types
string
: Text data like names, addresses, and emailsinteger
: Whole numbers like page count or household sizenumber
: Decimal numbers like prices or measurementsboolean
: True or Falseenum
: List of predefined values like colors or sizesarray
: List of items of the same typeobject
: Groups related data together
Best Practices
Naming
- Provide specific field names, e.g prefer
full_name
overname
- Use consistent naming convention such as
snake_case
.
Utilize Enums
Enums should always be used when there is a known set of possible values.
For example, prefer:
Over just using a string
type:
Specify Required Fields
- Make use of the
required
property to specify which fields are needed so that data is extracted correctly.
Example Schemas
Simple
A simple schema which extracts the required first and last name values from a document.
Enums
A schema which extracts the the shipping status of an order.
Objects & Arrays
An object schema which extracts data about a Person
. Take note of the nested address
object.
Schemas can be generated within our Playground, or programmatically using our API.
Something missing?
If you need help with something that is not covered in the documentation, please let us know by sending a message to alex@atlos.dev