Query JSON Data in Object Storage using External Tables

With Autonomous AI Database, you can access JSON documents stored in Oracle Cloud Infrastructure Object Storage using external tables.

Depending on the database version, you can run queries on JSON documents in different formats.

  • Query JSON documents as JSON objects with Oracle AI Database 26ai

  • Query JSON documents as textual data with Oracle Database 19c or earlier

Autonomous AI Database processes the following general JSON document representation in external files:

  • Multiple single line JSON documents within files

  • Single multiline JSON document per file

Here are some examples explaining these in detail.

Examples: Querying JSON Documents as JSON Objects

Autonomous AI Database allows running queries on JSON documents as JSON objects.

You can query various types of JSON documents as objects, such as:

  • Line-delimited JSON documents

  • JSON arrays

  • Object wrapped JSON arrays

  • Extended JSON (EJSON) Support

  • Single JSON document with multiline files

The following examples demonstrate how to access JSON documents as objects with Oracle AI Database 26ai using the format option jsondoc.

Example 1: Query Line-delimited JSON Documents as JSON Objects

This example shows how to query a JSON file containing multiple line-delimited JSON documents in object storage.

A JSON file, for example, fruitLineDelimited.json exists in the object storage that has three lines with one object per line.

{ "name": "apple", "count": 20 }
{ "name": "orange", "count": 42 }
{ "name": "pear", "count": 10 }
  1. Create an external table in your database using the JSON file.

    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'fruit_1',
        credential_name =>'DEF_CRED_NAME',
        format => json_object('type' value 'jsondoc'),
        file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/fruitLineDelimited.json';
    END;
    /
    

    The parameters are:

    • table_name: is the external table name.

    • credential_name: is the name of the credential created in the previous step. The credential_name parameter must conform to Oracle object naming conventions. See Database Object Naming Rules for more information.

    • format: defines the options you can specify to describe the format of the source file.

    • file_uri_list: is a comma delimited list of the source files you want to query.

  2. Query the external table using the SELECT statement.

    SELECT * FROM fruit_1;

Example 2: Query JSON Arrays as JSON Objects

This example shows how to query a single array of JSON objects. Arrays are unpacked by default.

A JSON file, for example, fruitArray.json exists in the object storage and has the following data.

[
   {
      "name" : "apple",
      "count": 20
   },
   {
      "name" : "orange",
      "count": 42
   },
   {
      "name" : "pear",
      "count": 10
   }
]
  1. Create an external table in your database using the JSON file.

    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'fruit_2a',
        credential_name =>'DEF_CRED_NAME',
        format => json_object('type' value 'jsondoc'),
        file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/fruitArray.json';
    END;
    /
    
  2. Query the external table using the SELECT statement.

    SELECT * FROM fruit_2a;
  3. You can use the jsonpath option to preserve the array structure and return an array of JSON object.

    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'fruit_2b',
        credential_name =>'DEF_CRED_NAME',
        format => json_object('type' value 'jsondoc', 'jsonpath' value '$'),
        file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/fruitArray.json';
    END;
    /
    
  4. Query the external table using the SELECT statement.

    SELECT * FROM fruit_2b;

Example 3: Query Object Wrapped JSON Arrays as JSON Objects

This example shows how to query a JSON file wrapped in an outer JSON document. In this example, you provide a path using the format option jsonpath to the data that you want to load. The path must lead to an array. The rows are mapped as in the previous example.

A JSON file, for example, fruitEmbeddedArray.json exists in the object storage and has the following data:

{
  "last_updated": 1434054678,
  "ttl": 0,
  "version": "1.0",
  "fruit": [
    { "name" : "apple", "count": 20 },
    { "name" : "orange", "count": 42 },
    { "name" : "pear", "count": 10 }
  ]
}
  1. Create an external table in your database using the JSON file.

    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'fruit_3',
        credential_name =>'DEF_CRED_NAME',
        format => json_object('type' value 'jsondoc', 'jsonpath' value '$.fruit[*]'),
        file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/fruitEmbeddedArray.json';
    END;
    /
    
  2. Query the external table using the SELECT statement.

    SELECT * FROM fruit_3;

Example 4: Query Extended JSON (EJSON) Documents as JSON Objects

This example shows how to query an EJSON file. The SQL type JSON can represent extended JSON types such as TIMESTAMP, DOUBLE, FLOAT, and RAW. The JSON text can represent extended JSON types by using the extended JSON format. These EJSON annotations are automatically converted to the corresponding types.

An EJSON file, for example, fruitEjson.json exists in the object storage and has the following data:

{ "name" : "apple", "count": 20, "modified":{"$date":"2020-06-29T11:53:05.439Z"} }
{ "name" : "orange", "count": 42 }
{ "name" : "pear", "count": 10 }
  1. Create an external table in your database using the JSON file.

    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'fruit_4',
        credential_name =>'DEF_CRED_NAME',
        format => json_object('type' value 'jsondoc'),
        file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/fruitEjson.json';
    END;
    /
    
  2. Query the external table using the SELECT statement.

    SELECT * FROM fruit_4;

Example 5: Query a Single JSON Document with Multiline Files as a JSON Object

This example shows how to query a single JSON document with multiline files.

  • A single JSON document with multiline files can be mapped to a table, where each JSON file in the directory is mapped to a single row.

  • A single JSON document with multiline files can be a directory containing JSON files where each JSON file is mapped to a single row in the table.

This example uses JSON files containing the following data:

fruitMultiLineFile1.json

{
   "name" : "apple",
   "count": 42
}

fruitMultiLineFile2.json

{
   "name" : "orange",
   "count": 5
}

fruitMultiLineFile3.json

{
   "name" : "pear",
   "count": 10
}
  1. Create an external table in your database using these JSON files.

    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'fruit_5',
        credential_name =>'DEF_CRED_NAME',
        format => json_object('type' value 'jsondoc'),
        file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/fruitMultiLine*.json';
    END;
    /
    
  2. Query the external table using the SELECT statement.

    SELECT * FROM fruit_5;

Examples: Querying JSON Documents as Textual Data

Autonomous AI Database allows running queries on JSON documents as textual data.

Oracle Database 19c does not support a native JSON data type. So, an external data in JSON data representation is handled as textual data. Consequently, JSON data in external objects is processed using the VARCHAR2, CLOB, or BLOB data types.

Points to Consider While Accessing JSON Documents

  • Define the field delimiter option to a value or character that is not in your JSON data. In other words, the field delimiter value does not exist in your JSON documents. When you create a table with the DBMS_CLOUD package, the default field delimiter is ‘|’ (pipe).

    Let us consider a scenario where a JSON data contains ‘|’. For example:

    {"attr1": "oK4IJ|V", "attr2": "igN”}

    If the field delimiter is not specified, ‘|’ is considered as the field delimiter and a query returns the following.

    {"attr1": "oK4IJ

    To ensure that your JSON documents are not split unintentionally, rendering the data to invalid JSON documents, set the field delimiter format option to a value that is not in the data. For example:

    json_object('delimiter' value 'X''0''')
  • If your JSON documents are stored as single document line-delimited data, they are processed as textual data without any issues. For a JSON document that spans across multiple rows in an external file, you must set the format option recorddelimiter to a value or character that does not exist in the data file. For example:

    json_object('recorddelimiter' value '0x''1B''')

    If the recorddelimiter option is not set, every line is considered as an individual record. The default record delimiter newline renders the data to invalid JSON documents.

  • In an external data file, if a field is not explicitly specified then the default data type is set to CHAR(255). Thus, reading JSON documents larger than 255 characters returns an error. Therefore, you must set the parameter field_list to a value higher than the largest JSON document within your files. For example:

    field_list =>'"MYDATA" char(10000)'
  • The default read buffer size in Autonomous Database is 10 MB. For JSON documents larger than 10 MB, you must set the format option readsize to a higher value, let's say 20 MB.

    json_object('readsize' value '20000000')

Here is a sample external table definition that includes all these parameters.

BEGIN
   DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
    table_name =>'fruit_5',
    credential_name =>'DEF_CRED_NAME',
    format => json_object('type' value 'jsondoc'),
    file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/fruitMultiLine*.json';
END;
/

The following examples demonstrate how to access JSON documents as textual data with Oracle Database 19c or earlier using the CLOB data type. Alternatively, you can use the VARCHAR2 data type depending on the size of your JSON documents.

  • Line-delimited JSON documents

  • JSON arrays

  • Object wrapped JSON arrays

  • Single JSON document with multiline files

Example 1: Query Line-delimited JSON Documents as Textual Data

This example shows how to query a JSON file containing multiple line-delimited JSON documents in object storage.

A JSON file, for example, fruitLineDelimited.json exists in the object storage that has three lines with one object per line.

{ "name": "apple", "count": 20 }
{ "name": "orange", "count": 42 }
{ "name": "pear", "count": 10 }
  1. Create an external table in your database using the JSON file.

    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'fruit_6',
        column_list => 'mydata clob',
        credential_name =>'DEF_CRED_NAME',
        field_list =>'"MYDATA" char(10000)',
        format => json_object('delimiter' value 'X''0'''),
        file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/fruitLineDelimited.json';
    END;
    /
    

    The parameters are:

    • table_name: is the external table name.

    • column_list: is a comma delimited list of column names and data types for the external table. The list includes the columns inside the data file and those derived from the object name (from names in the file path specified by file_uri_list).

    • credential_name: is the name of the credential created in the previous step. The credential_name parameter must conform to Oracle object naming conventions. See Database Object Naming Rules for more information.

    • field_list: identifies the fields in the source files and their data types.

    • format: defines the options you can specify to describe the format of the source file.

    • file_uri_list: is a comma delimited list of the source files you want to query.

  2. Query the external table using the SELECT statement.

    SELECT * FROM fruit_6;

Example 2: Query JSON Arrays as Textual Data

This example shows how to query a single array of JSON objects. You cannot automatically unpack arrays as part of the external table data access, but you can do the unpacking while accessing the data.

A JSON file, for example, fruitArray.json exists in the object storage and has the following data.

[
   {
      "name" : "apple",
      "count": 20
   },
   {
      "name" : "orange",
      "count": 42
   },
   {
      "name" : "pear",
      "count": 10
   }
]
  1. Create an external table in your database using the JSON file.

    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'fruit_7',
        column_list => 'mydata clob',
        credential_name =>'DEF_CRED_NAME',
        field_list =>'"MYDATA" char(10000)',
        format => json_object('delimiter' value 'X''0''','recorddelimiter' value '0x''1B'''),
        file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/fruitArray.json';
    END;
    /
    
  2. Query the external table using the SELECT statement.

    SELECT * 
    FROM fruit_7
    NESTED mydata 
    COLUMNS (nested path '$[*]' 
    COLUMNS (data format json path '$'));

Example 3: Query Object Wrapped JSON Arrays as Textual Data

This example shows how to query a JSON file wrapped in an outer JSON document. You cannot automatically unpack arrays as part of the external table data access, but you can do the unpacking and JSON document subsetting while accessing the data.

A JSON file, for example, fruitEmbeddedArray.json exists in the object storage and has the following data:

{
  "last_updated": 1434054678,
  "ttl": 0,
  "version": "1.0",
  "fruit": [
    { "name" : "apple", "count": 20 },
    { "name" : "orange", "count": 42 },
    { "name" : "pear", "count": 10 }
  ]
}
  1. Create an external table in your database using the JSON file.

    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'fruit_8',
        column_list => 'mydata clob',
        credential_name =>'DEF_CRED_NAME',
        field_list =>'"MYDATA" char(10000)',
        format => json_object('delimiter' value 'X''0''','recorddelimiter' value '0x''1B'''),
        file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/fruitEmbeddedArray.json';
    END;
    /
    
  2. Query the external table using the SELECT statement.

    SELECT * 
    FROM fruit_8 
    NESTED mydata 
    COLUMNS (nested path '$.fruit[*]' 
    COLUMNS (data format json path '$'));
    

Example 4: Query a Single JSON Document with Multiline Files as Textual Data

This example shows how to query a single JSON document with multiline files.

  • A single JSON document with multiline files can be mapped to a table, where each JSON file in the directory is mapped to a single row.

  • A single JSON document with multiline files can be a directory containing JSON files where each JSON file is mapped to a single row in the table.

This example uses JSON files containing the following data:

fruitMultiLineFile1.json

{
   "name" : "apple",
   "count": 42
}

fruitMultiLineFile2.json

{
   "name" : "orange",
   "count": 5
}

fruitMultiLineFile3.json

{
   "name" : "pear",
   "count": 10
}
  1. Create an external table in your database using these JSON files.

    BEGIN
       DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name =>'fruit_9',
        column_list => 'mydata clob',
        credential_name =>'DEF_CRED_NAME',
        field_list =>'"MYDATA" char(10000)',
        format => json_object('delimiter' value 'X''0''','recorddelimiter' value '0x''1B'''),
        file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/fruitMultiLine*.json';
    END;
    /
    
  2. Query the external table using the SELECT statement.

    SELECT * FROM fruit_9;

For more information on JSON documents, see JSON Document Stores