QuickStart

Use the following steps to begin building enterprise AI agents with OCI Generative AI. This guide walks you through the initial setup, with context along the way so you understand not just what to do, but why it matters.

Prerequisite: Set Up IAM Permissions

Before creating a project, ensure that the appropriate user groups have access to OCI Generative AI resources. Without these permissions, you won’t be able to create or manage projects and related assets.

OCI provides an aggregate resource type, generative-ai-family, which grants access to all Generative AI resource types through a single policy.

Tip

Grant broad access only to administrators or users working in sandbox or development environments. For production use, consider applying more restrictive policies.

Grant Access at the Tenancy Level

To allow a user group to manage all Generative AI resources across the tenancy:

allow group <your-group-name> to manage generative-ai-family 
in tenancy

Grant Access at the Compartment Level

To scope access to a specific compartment:

allow group <your-group-name> to manage generative-ai-family 
in compartment <your-compartment-name>

After these permissions are in place, you’re ready to create the first project.

1. Create a Project

A project is the foundational resource for organizing and managing AI agents and related assets in OCI Generative AI. You can create a project using the Oracle Cloud Console.

After creating a project, you can manage it through the Console—for example, updating its details, moving it to another compartment, managing tags, or deleting it. These actions are available from the Actions menu (three dots) in the project list page.

To begin, navigate to the project list page and select Create project.

Basic Information

Start by defining the core attributes of the project:

Name (optional):

Provide a name that begins with a letter or underscore, followed by letters, numbers, hyphens, or underscores (1–255 characters). If you don’t specify a name, one is automatically generated using the format:

generativeaiproject<timestamp> (for example, generativeaiproject20260316042443). You can update this later.
Description (optional):

Add a brief description to help identify the purpose of the project.
Compartment:

Select the compartment where the project will reside. By default, this is the current compartment, but you can choose any compartment where you have the required permissions.

Data Retention

Configure how long generated data is stored. This helps you balance usability with data lifecycle requirements:

Response retention:

Defines how long individual model responses are stored after generation.
Conversation retention:

Determines how long an entire conversation is retained after its most recent update.

You can set both values in hours, up to a maximum of 720 hours (30 days).

Short-Term Memory Compaction

This feature improves efficiency by summarizing recent conversation history into a compact representation. It helps maintain context while reducing token usage and latency.

Enable (optional):

Turn on short-term memory compaction to automatically condense prior interactions.
Model selection:

If enabled, select a compaction model. Available models vary by region.

Important

The compaction model is selected at creation time and cannot be changed later.
Once enabled, this feature cannot be disabled without deleting the project.

(Optional) Enable short‑term memory compaction to summarize earlier chat history and keep context lightweight.
If you enable the short-term memory compaction feature, then select a model from the list for compaction.
The list of models varies by region. For available models, see Generative AI Models and Regions for Agentic API.

Long-Term Memory

Long-term memory allows the system to extract and persist important information from conversations for future use. This data is stored as embeddings, making it searchable and reusable across interactions.

Enable (optional):

Turn on long-term memory to retain key insights from conversations.
Model selection:
- Extraction model: Identifies and captures important information.
- Embedding model: Converts stored data into vector representations for retrieval.

Important

These models must be selected during project creation and can't be changed later.
After it's enabled, long-term memory can't be disabled unless the project is deleted.

Tip

For best results, set both response and conversation retention to the maximum duration (720 hours) when using long-term memory.

2. Create an API Key

Create an API key to authenticate requests to OCI Generative AI. You can name the key and optionally configure up to two key names with expiration dates and times.

API keys are used to authenticate requests to OCI Generative AI. In this step, you’ll create a key that applications or tools can use to securely access the service.

You can create and manage API keys using the Console, CLI, or API.

Important

Ensure that you Add API Key Permission after you create the key.

On the API keys list page, select Create API key. If you need help finding the list page, see Listing API Keys.
Basic information
Enter a Name for the API key (required). Start the name with a letter or underscore, followed by letters, numbers, hyphens, or underscores. The length can be 1–255 characters.
(Optional) Enter a Description.
(Optional) To save the API key in a different compartment than the one listed, select another Compartment.
You must have permission to work in a compartment to see the resources in it. If you're not sure which compartment to use, contact an administrator. For more information, see Understanding Compartments.

(Optional) Assign Tags to this API key. See Resource Tags.
Key names and expiration times
Enter a name for the first key: Key one name. Start the name with a letter or underscore, followed by letters, numbers, hyphens, or underscores. The length can be 1–255 characters.
(Optional) Set the Key one expiration date and Key one expiration time (UTC).
The default value is three months from the creation date.

Enter a name for the second key: Key two name.
(Optional) Set the Key two expiration date and Key two expiration time (UTC).
The default value is three months from the creation date.

Select Create.
Ensure that you Add API Key Permission after you create the key.
Use the api-key create command and required parameters to create an API key.
```
oci generative-ai api-key create [OPTIONS]
```
For a complete list of parameters and values for CLI commands, see the CLI Command Reference.
Run the CreateApiKey operation to create an API key. Provide the display name, optional description, and any key names with their expiration timestamps (UTC).

3. Add Permission to the API Key

Find the API Key OCID

To scope permissions to a specific API key, you need its OCID.

In the Console:

Navigate to the API Keys list page.
Select the API key you created.
Copy the OCID (it typically starts with ocid1.generativeaiapikey...)

Grant Permission to the API Key

Create an IAM policy to allow the API key to invoke the Responses API:

allow group <your-group-name> 
to manage generative-ai-response in tenancy where ALL 
{request.principal.type='generativeaiapikey', 
request.principal.id='<your-api-key-OCID>'}

This policy allows requests authenticated with the specified API key to access the Responses API, while keeping access scoped and controlled.

4. Call the OCI Responses API

The OCI Responses API is the primary interface for building Enterprise AI agentic applications in OCI Generative AI. It provides a flexible way to combine core capabilities—such as orchestration, reasoning, tools, and conversation state—within a single request.

With this API, you can:

Run simple, single-step inference or build multi-step agent workflows
Enable or disable reasoning depending on the use case
Integrate tools (platform-managed or client-side)
Manage conversation state either in the service or within the client

This unified approach allows you to start simple and progressively build more advanced agents, while maintaining control over cost, latency, and behavior.

Base URL 🔗

Use the following base URL to access the OCI Responses API:

https://inference.generativeai.<region>.oci.oraclecloud.com/openai/v1

Replace <region> with the appropriate region identifier (for example, us-chicago-1).

SDK Support 🔗

The OCI Responses API is compatible with the OpenAI SDK, which is recommended for interacting with the service. It is supported across many languages, including Python, Java, TypeScript, Go, and .NET.

You can also use it with popular agent frameworks such as LangChain, LlamaIndex, and OpenAI Agents SDK.

Install the Official OpenAI SDK (Python)

Python

pip install openai

Note

To invoke the Responses API, ensure that you use the OpenAI SDK, not the OCI SDK. Also ensure you have the latest version of the OpenAI SDK installed.

For other languages, see OpenAI libraries page.

Make Your First Request 🔗

The following example shows how to call the Responses API using Python:

from openai import OpenAI

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1", # change the region if needed
    api_key="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", # replace with your Generative AI API Key created in Step 2
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx"  # replace with your Generative AI Project OCID created in Step 1
)

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)

print(response.output_text)

If the response returns an explanation, the OCI Responses API is working correctly.

Understanding OCI Responses API Endpoints

The OCI Responses API uses an OpenAI-compatible interface, but all requests are routed through the OCI Generative AI inference endpoint:

https://inference.generativeai.<region>.oci.oraclecloud.com/openai/v1

This means you can use familiar OpenAI-style APIs (such as /responses or /containers), while all requests are executed within OCI.

Although the APIs follow the OpenAI format, they are fully integrated with OCI:

Authentication uses OCI Generative AI API keys or IAM-based access—not OpenAI credentials
Resources (such as containers, vector stores, or files) are created and managed within OCI, not in an OpenAI environment
Execution and data processing remain entirely within OCI

For example, when you call:

/openai/v1/containers

the container is created and managed in OCI Generative AI.

Only the following listed endpoints listed are supported. Other OpenAI endpoints aren't compatible with OCI Generative AI.

The remainder of this quickstart provides examples of how to use these endpoints.

Available Endpoints


API	Base URL	Authentication	Endpoint path
Responses API	`https://inference.generativeai.${region}.oci.oraclecloud.com`	API key or IAM session	`/openai/v1/responses`
Conversations API	`https://inference.generativeai.${region}.oci.oraclecloud.com`	API key or IAM session	`/openai/v1/conversations`
Files API	`https://inference.generativeai.${region}.oci.oraclecloud.com`	API key or IAM session	`/openai/v1/files`
Vector Store Files API	`https://inference.generativeai.${region}.oci.oraclecloud.com`	API key or IAM session	`/openai/v1/vector_stores/{id}/files`
Vector Store Search	`https://inference.generativeai.${region}.oci.oraclecloud.com`	API key or IAM session	`/openai/v1/vector_stores/{id}/search`
Containers API	`https://inference.generativeai.${region}.oci.oraclecloud.com`	API key or IAM session	`/openai/v1/containers`
Project CRUD	`https://generativeai.${region}.oci.oraclecloud.com`	IAM session only	`/20231130/generativeAiProjects`
API Key CRUD	`https://generativeai.${region}.oci.oraclecloud.com`	IAM session only	`/20231130/apikeys`
Semantic Store CRUD	`https://generativeai.${region}.oci.oraclecloud.com`	IAM session only	`/20231130/semanticStores`
Vector Store CRUD	`https://generativeai.${region}.oci.oraclecloud.com`	IAM session only	`/20231130/openai/v1/vector_stores`

OCI IAM Authentication

In the previous steps, you used a Generative AI API key to authenticate requests to the OCI Responses API. API keys are a convenient option for quick testing and early development. However, for production workloads, many teams prefer OCI IAM-based authentication for improved security and centralized access control.

The OCI Responses API fully supports OCI IAM authentication. This section shows how to use IAM-based authentication instead of API keys.

When to Use IAM Authentication

Consider using IAM authentication when:

Running applications in OCI services (for example, Functions or OKE)
Avoiding long-lived credentials such as API keys
Enforcing fine-grained access control through IAM policies

Install the OCI IAM Auth Library

Install the oci-genai-auth library, which provides helper utilities for integrating OCI IAM authentication with the OpenAI SDK:

pip install oci-genai-auth

This library includes the following authentication helpers:

OciSessionAuth (for local development)
OciUserPrincipalAuth
OciInstancePrincipalAuth
OciResourcePrincipalAuth (for OCI-managed environments)

Configure the OpenAI Client

When using IAM authentication, initialize the OpenAI client with a custom HTTP client and authentication handler. The api_key value is "not used" in this case.

Example: Local Development (OciSessionAuth)

Use this approach when running code locally (for example, on a laptop using an OCI CLI profile):

from openai import OpenAI
from oci_openai import OciSessionAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",  # update region if needed
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",  # project OCID created earlier
    http_client=httpx.Client(auth=OciSessionAuth(profile_name="DEFAULT"))  # update profile if needed
)

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)

print(response.output_text)

Example: OCI Managed Environments (OciResourcePrincipalAuth)

Use this approach when running in OCI services such as OCI Functions or OCI Container Engine for Kubernetes (OKE):

from openai import OpenAI
from oci_openai import OciResourcePrincipalAuth
import httpx

client = OpenAI(
    base_url="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/openai/v1",  # update region if needed
    api_key="not-used",
    project="ocid1.generativeaiproject.oc1.us-chicago-1.xxxxxxxx",  # project OCID created earlier
    http_client=httpx.Client(auth=OciResourcePrincipalAuth()),
)

Using OCI IAM authentication allows the application to securely access OCI Generative AI without managing API keys, while aligning with standard OCI security practices.

Ensure Required Permissions

Before calling the OCI Responses API, verify that the appropriate IAM policies are in place. The required policies depend on the authentication method used.

If Using OCI IAM Authentication

If you added the policies in Prerequisite: Set Up IAM Permissions, skip this step.

To allow a user group to call the Responses API add the following policy:

allow group <your-group-name> 
to manage generative-ai-response in tenancy

If Using Generative AI API Key Authentication

When using API key authentication, another policy is required to authorize requests made with the API key. If you added the policies in 3. Add Permission to the API Key, skip this step.

To grant access to a specific API key:

allow group <your-group-name> 
to manage generative-ai-response in tenancy 
where ALL {request.principal.type='generativeaiapikey', 
request.principal.id='<your-api-key-OCID>'}

For broader access (for example, during testing), you can use a more general policy:

allow any-user 
to manage generative-ai-family in tenancy 
where ALL {request.principal.type='generativeaiapikey', 
request.principal.id='<your-api-key-OCID>'}

These policies ensure that requests, whether authenticated through IAM or API keys are authorized to access OCI Generative AI resources.

Enable Debug Log

If you encounter issues when calling the API, enabling debug logging can help with troubleshooting. Debug logs display the raw HTTP requests and responses, including the opc-request-id, which is useful when working with Oracle support.

You can reference this request ID when reporting issues to help identify and diagnose problems more quickly.

from openai import OpenAI
import logging

logger = logging.getLogger("openai")
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

# Create and use the OpenAI client as usual
client = OpenAI(
    ...
)

Call Models

You can use the OCI Responses API to call different types of models available in OCI Generative AI supported regions. For a list of supported models and regions, see Agent Models and Regions.

Third-Party Hosted Models

OCI Generative AI provides access to models from third-party providers. Specify the model using its fully qualified name:

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    input="Write a one-sentence explanation of what a database is."
)

response = client.responses.create(
    model="google.gemini-2.5-pro",
    input="Write a one-sentence explanation of what a database is."
)

On-Demand Models

On-demand models are hosted and managed by OCI and are available without requiring dedicated infrastructure:

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Write a one-sentence explanation of what a database is."
)

Dedicated AI Clusters (Dedicated Mode)

For production workloads requiring isolation or predictable performance, you can deploy models on a dedicated AI cluster. In this case, use the cluster endpoint OCID as the model identifier:

response = client.responses.create(
    model="<dedicated-ai-cluster-endpoint-ocid>",
    input="Write a one-sentence explanation of what a database is."
)

This flexibility allows you to select the deployment model that best fits your requirements for performance, cost, and control.

Stream Responses

The OCI Responses API supports streaming, allowing you to receive model outputs incrementally as they are generated. This can improve responsiveness for longer outputs.

Stream All Events

response_stream = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Explain the difference between structured and unstructured data.",
    stream=True
)

for event in response_stream:
    print(event)

Stream Only Text Output (Delta Tokens)

response_stream = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Explain the difference between structured and unstructured data.",
    stream=True
)

for event in response_stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

Streaming is especially useful for interactive applications where you want to display responses to users as they're generated.

Structured Output

n some use cases, you might want the model to return responses in a structured format instead of free-form text. The OCI Responses API supports this by allowing you to define a schema and parse the model output into strongly typed objects.

This approach is useful when integrating with downstream systems, enforcing consistency, or extracting specific fields from natural language input.

from pydantic import BaseModel

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]


response = client.responses.parse(
    model="openai.gpt-oss-120b",
    input=[
        {"role": "system", "content": "Extract the event details."},
        {
            "role": "user",
            "content": "The team meeting is scheduled for Monday with Sarah, John, and Priya.",
        },
    ],
    store=False,
    text_format=CalendarEvent,
)

event = response.output_parsed
print(event)

Trace API Calls

When you call the OCI Responses API, the response includes an output field. This field is an array of items that describe what occurred during the request.

Each item represents a step in the execution and can include different types, such as:

message
web_search_call
file_search_call
mcp_call
mcp_list_tools

These output items provide visibility into how the request was processed. You can use them to:

Debug and understand model behavior
Display execution steps in a user interface
Build custom observability or logging workflows

Integrate with Observability Tools 🔗

For deeper insight such as latency, cost, and execution traces—you can integrate the OCI Responses API with observability platforms.

Many providers support OpenAI-compatible APIs. One such option is Langfuse, an open-source LLM engineering platform that helps developers debug, monitor, and improve LLM applications. It provides end-to-end observability for tracing agent actions, supports prompt versioning, and helps with evaluation of model outputs. Langfuse integrates with popular frameworks such as OpenAI, LangChain, and LlamaIndex.

The following example shows how to use Langfuse to trace and monitor Responses API calls.

Step 1: Install the Langfuse SDK

pip install langfuse

Step 2: Configure Environment Variables

Set the required Langfuse and OCI environment variables:

LANGFUSE_SECRET_KEY="sk-lf-xxxxxxxxx"
LANGFUSE_PUBLIC_KEY="pk-lf-xxxxxxxxx"
LANGFUSE_BASE_URL="https://us.cloud.langfuse.com"

# OCI Generative AI credentials
OCI_GENAI_API_KEY="sk-xxxxxxxxx"
OCI_GENAI_PROJECT_ID="ocid1.generativeaiproject.oc1.xxx"

Step 3: Instrument the OpenAI Client

Import the OpenAI client from the Langfuse SDK. Existing code remains unchanged, but requests are automatically traced.

import os
from langfuse.openai import OpenAI  # Import from Langfuse

client = OpenAI(
    base_url="https://inference.generativeai.us-ashburn-1.oci.oraclecloud.com/openai/v1",
    api_key=os.getenv("OCI_GENAI_API_KEY"),
    project=os.getenv("OCI_GENAI_PROJECT_ID"),
)

# Requests are automatically instrumented by Langfuse
response = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=[
        {
            "type": "mcp",
            "server_label": "dmcp",
            "server_url": "https://mcp.deepwiki.com/mcp",
            "require_approval": "never",
        },
    ],
    input="Explain why tracing and observability are important in distributed systems."
)

print(response.output_text)

This integration provides end-to-end visibility into API calls without requiring significant changes to your application code.

Multimodal Inputs

The OCI Responses API supports models that accept multimodal inputs. You can combine text with images, files, and reasoning controls to support richer workflows such as document analysis, image understanding, and more deliberate model responses.

Image Input as Base64-Encoded Data URL

import base64

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image("/path/to/image.png")

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Describe the main objects in this image."},
                {
                    "type": "input_image",
                    "image_url": f"data:image/jpeg;base64,{base64_image}",
                    "detail": "high",
                },
            ],
        }
    ],
)

print(response.output_text)

Image Input as Internet-Accessible URL

response = client.responses.create(
    model="openai.gpt-oss-120b",
    store=False,
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Describe the scene shown in this image."},
                {
                    "type": "input_image",
                    "image_url": "https://example.photos/id/123",
                },
            ],
        }
    ],
)

print(response.output_text)

Replace the "image_url" with a valid image URL.

File Input as File ID

Important

File Id as input feature is only supported with Google Gemini models For each request, the combined size of all uploaded PDF files must be under 50 MB, and you can provide a maximum of 10 file IDs in the request. See supported Gemini models.

file = client.files.create(
    file=open("<path-to-file>", "rb"),
    purpose="user_data"
)

response = client.responses.create(
    model="google.gemini-2.5-pro",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_file",
                    "file_id": file.id,
                },
                {
                    "type": "input_text",
                    "text": "Summarize this document.",
                },
            ]
        }
    ]
)

print(response.output_text)

File Input as Internet-Accessible URL

response = client.responses.create(
    model="google.gemini-2.5-flash",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Summarize this file."},
                {
                    "type": "input_file",
                    "file_url": "https://www.example.com/letters/example-letter.pdf",
                },
            ],
        }
    ],
)

print(response.output_text)

Reasoning

Reasoning controls let you tune how much effort the model uses before producing a response. This is useful when you want to prioritize speed, depth, or a balance of both.

Reasoning Effort

import json

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Solve 18 * (4 + 2).",
    reasoning={"effort": "medium"},
    store=False,
)

print(response.output_text)

Reasoning Summary Output

If you're building a chatbot, enabling reasoning summaries can help users better understand how the model arrived at a result. During streaming, users can also see reasoning tokens while the model is thinking.

import json

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Solve 18 * (4 + 2).",
    reasoning={"summary": "auto"},
    store=False,
)

print(response.output_text)

Function Tools

Function tools let the model request data or actions from the client application during a response flow. This is useful when the model needs information that lives outside the prompt itself, such as calendar data, internal application state, or the result of a custom operation.

With this pattern, the model does not execute the function directly. Instead, it signals that a function should be used, the client application performs that work, and then the application sends the result back so the model can continue and produce the user-facing answer.

What This Enables

Function tools are useful when the application needs to stay in control of execution while still allowing the model to decide when outside information is needed.

Typical use cases include:

Looking up calendar events
Retrieving application data
Calling internal or external APIs
Running business logic or calculations

This approach gives you flexibility while keeping the execution path inside the application.

Execution Flow 🔗

A typical function tool interaction works like this:

The client sends a request that includes one or more tool definitions.
The model decides whether one of those tools is needed.
If a tool is needed, the model returns the tool name and arguments.
The application runs the tool and prepares the result.
The application sends that result back in a follow-up request.
The model uses that result to complete the response.

State Handling Options 🔗

There are two common ways to manage state across these requests:

Service-managed state: Recommended for most use cases. The follow-up request includes previous_response_id, and the service tracks the earlier exchange.
Client-managed state: The application keeps the entire interaction history and sends the accumulated context with each request.

Tip

Keep tool definitions precise. Clear names, accurate descriptions, and well-defined parameters help the model select the right tool and generate usable arguments.

Define a Function Tool

The following example defines a tool that retrieves calendar events for a specified date.

tools = [
    {
        "type": "function",
        "name": "get_calendar_events",
        "description": "Return calendar events scheduled for a specific date.",
        "parameters": {
            "type": "object",
            "properties": {
                "date": {
                    "type": "string",
                    "description": "Date to query, for example 2026-04-02"
                }
            },
            "required": ["date"],
        },
    },
]

Include this tools array in the client.responses.create() request.

Example: Service-Managed State

In this pattern, the first request allows the model to decide whether the tool is needed. The second request sends the tool result back and references the earlier response.

import json

tools = [
    {
        "type": "function",
        "name": "get_calendar_events",
        "description": "Return calendar events scheduled for a specific date.",
        "parameters": {
            "type": "object",
            "properties": {
                "date": {
                    "type": "string",
                    "description": "Date to query, for example 2026-04-02"
                }
            },
            "required": ["date"],
        },
    },
]

def get_calendar_events(date):
    # Replace this with actual calendar logic or an API call
    return [
        {"time": "09:00", "title": "Team standup"},
        {"time": "13:00", "title": "Design review"},
        {"time": "16:00", "title": "Project check-in"},
    ]

# Initial request
response = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=tools,
    input="Show the calendar events for 2026-04-02.",
)

# Execute the requested function
tool_outputs = []
for item in response.output:
    if item.type == "function_call" and item.name == "get_calendar_events":
        args = json.loads(item.arguments)
        events = get_calendar_events(**args)
        tool_outputs.append({
            "type": "function_call_output",
            "call_id": item.call_id,
            "output": json.dumps({"events": events}),
        })

# Follow-up request
final = client.responses.create(
    model="openai.gpt-oss-120b",
    instructions="Summarize the schedule clearly for the user.",
    tools=tools,
    input=tool_outputs,
    previous_response_id=response.id,
)

print(final.output_text)

Example: Service-Managed State

In this pattern, the application keeps the full exchange and resubmits it with the follow-up request.

import json

tools = [
    {
        "type": "function",
        "name": "get_calendar_events",
        "description": "Return calendar events scheduled for a specific date.",
        "parameters": {
            "type": "object",
            "properties": {
                "date": {
                    "type": "string",
                    "description": "Date to query, for example 2026-04-02"
                }
            },
            "required": ["date"],
        },
    },
]

def get_calendar_events(date):
    # Replace this with actual calendar logic or an API call
    return [
        {"time": "09:00", "title": "Team standup"},
        {"time": "13:00", "title": "Design review"},
        {"time": "16:00", "title": "Project check-in"},
    ]

conversation = [
    {"role": "user", "content": "Show the calendar events for 2026-04-02."}
]

response = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=tools,
    input=conversation,
)

conversation += response.output

for item in response.output:
    if item.type == "function_call" and item.name == "get_calendar_events":
        args = json.loads(item.arguments)
        events = get_calendar_events(**args)
        conversation.append({
            "type": "function_call_output",
            "call_id": item.call_id,
            "output": json.dumps({"events": events}),
        })

final = client.responses.create(
    model="openai.gpt-oss-120b",
    instructions="Summarize the schedule clearly for the user.",
    tools=tools,
    input=conversation,
)

print(final.output_text)

Function tools are a strong option when the application must remain responsible for execution, access control, and integration logic, while still allowing the model to request the information it needs.

MCP Tool

The MCP tool allows a model to use capabilities exposed by a remote MCP server during a request. These capabilities can include access to external services, data sources, or application endpoints.

In OCI Generative AI, this capability is available through remote MCP calling, which lets the service interact with an MCP server as part of the model workflow.

When to Use the MCP Tool

Use the MCP tool when the model needs access to external capabilities that are hosted on an MCP server.

This approach is helpful when you want:

The service to communicate with the MCP server directly
Fewer client-side orchestration steps
Lower latency than a client-executed tool pattern
Access to capabilities exposed through a remote MCP server

The MCP tool is part of the OCI Generative AI toolset and can be used alongside other supported tools.

Transport Support

Remote MCP calling uses Streamable HTTP.

Define an MCP Tool

Add the MCP server definition in the tools field of the request.

response_stream = client.responses.create(
    model="xai.grok-code-fast-1",
    tools=[
        {
            "type": "mcp",
            "server_label": "calendar",
            "server_description": "An MCP server that retrieves calendar events for a specified date.",
            "server_url": "https://example.com/mcp",
            "require_approval": "never",
        },
    ],
    input="What events are scheduled for 2026-04-02?",
    stream=True,
)

for event in response_stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

This example streams the response and prints text as it is generated.

Restrict the Tools Exposed by the MCP Server

If the remote MCP server exposes more tools than the application needs, you can narrow the available set by using allowed_tools.

response_stream = client.responses.create(
    model="openai.gpt-oss-120b",
    tools=[
        {
            "type": "mcp",
            "server_label": "calendar",
            "server_description": "An MCP server that retrieves calendar events for a specified date.",
            "server_url": "https://example.com/mcp",
            "require_approval": "never",
            "allowed_tools": ["get_events"],
        },
    ],
    input="Show the calendar events for 2026-02-02.",
    stream=True,
    store=False,
)

Provide Authentication to the MCP Server

If the remote MCP server requires authentication, pass the access token in the authorization field.

response_stream = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    tools=[
        {
            "type": "mcp",
            "server_label": "calendar",
            "server_url": "https://calendar.example.com/mcp",
            "authorization": "$CALENDAR_OAUTH_ACCESS_TOKEN"
        },
    ],
    input="List my meetings for 2026-02-02.",
    stream=True,
    store=False,
)

Pass only the token value. Don't include the Bearer prefix. OCI sends the token over TLS as part of the request and does not decode, inspect, store, or log it.

MCP Server Hosting

OCI Generative AI also provides MCP server hosting to help deploy and scale MCP servers.

The MCP tool is a good option when you want OCI Generative AI to work directly with a remote MCP server during request execution, without requiring the client application to handle each tool step itself.

File Search Tool

The File Search tool lets the model look up relevant content from files stored in a vector store and use that retrieved content when forming a response. This is useful when you want answers to reflect the documents you have provided, rather than relying only on the model’s built-in knowledge.

Because File Search is handled by the service, the application doesn't need to implement its own retrieval pipeline.

Prepare a Vector Store

Before using File Search, create a vector store and add the files you want the model to reference. OCI Generative AI supports several file-handling patterns through APIs that follow the OpenAI Files API style.


API Set	Description
Files	Standard file management.
Vector Store Files	Files tied directly to stores.
Vector Store File Batches	Files for batch processing to add to stores.
Container Files	Containerized file handling.

Example

To make File Search available in a request, add a tool entry with type: "file_search" and provide the vector store ID.

response = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Summarize the main ideas covered in the documents in this vector store.",
    tools=[
        {
            "type": "file_search",
            "vector_store_ids": ["<vector_store_id>"]
        }
    ]
)

print(response)

Note

The model can use the vector store content during response generation.
File retrieval is managed by the platform.
Hybrid search parameters aren't supported with the File Search tool.

Code Interpreter

Code Interpreter lets the model write and run Python code in a secure container. This is useful for tasks such as calculations, data analysis, and file processing.

In prompts, you can call the Code Interpreter tool as the python tool. For example: Use the python tool to solve the problem.

Because the code runs in an isolated environment with no external network access, it's a good option for tasks that need computation or file processing in a controlled setting.

What You Can Use It For

If you're using this for the first time, it helps to think of Code Interpreter as a temporary Python workspace for the model.

You can use it for tasks such as:

solving math problems
analyzing uploaded files
cleaning or transforming data
creating charts or tables
generating output files such as logs, or processed datasets

Execution Environment

The Python environment includes more than 420 preinstalled libraries, so many common tasks work without extra setup.

The code runs inside a container. This container is the working environment where Python runs and where files are stored during the session.

Container Memory Limits

Code Interpreter containers use a shared memory pool of 64 GB per tenancy.

Supported container sizes are:

1 GB
4 GB
16 GB
64 GB

This shared limit can be divided across multiple containers. For example, it could support:

sixty-four 1 GB containers
sixteen 4 GB containers
four 16 GB containers
one 64 GB container

If you need more capacity, you can submit a service request.

Container Expiration

A container expires after 20 minutes of inactivity.

This is important to know when building multi-step flows:

an expired container can't be reused
you must create a new container
files must be uploaded again if needed
in-memory state, such as Python variables, is lost

Because of this, it's best to treat containers as temporary working environments.

Example

response = client.responses.create(
    model="xai.grok-4-1-fast-reasoning",
    tools=[
        {
            "type": "code_interpreter",
            "container": {"type": "auto"}
        }
    ],
    instructions="Use the python tool to solve the problem and explain the result.",
    input="Find the value of (18 / 3) + 7 * 2."
)

print(response.output_text)

Containers for Code Interpreter

Code Interpreter needs a container. The container is the isolated environment where the model runs Python code.

A container can hold:

uploaded files
files created by the model
temporary working data during execution

When you use Code Interpreter, you can assign one of two container modes:

auto: OCI Generative AI creates the container for you and automatically assigns a container size.
conatiner OCID: You create the container yourself and define the container size and provide the OCID.

For both options, the containers are created and managed in OCI Generative AI. The code that runs in those containers also runs in the OCI Generative AI tenancy.

Auto Mode

In auto mode, the service creates the container for you. This is the easiest option and a good starting point for most users.

Use auto mode when:

you want OCI Generative AI to manage the container
you do not need direct control over the environment
you want a simpler setup

response = client.responses.create(
    model="xai.grok-code-fast-1",
    tools=[{
        "type": "code_interpreter",
        "container": {
            "type": "auto"
        }
    }],
    input="Use the python tool to calculate the average of 12, 18, 24, and 30."
)

Explicit Mode

In explicit mode, you create the container first and set the size. Then you pass the container ID in the request.

Use explicit mode when you want more control over the container specifics such as the memory size.

container = client.containers.create(name="test-container", memory_limit="4g")

response = client.responses.create(
    model="xai.grok-code-fast-1",
    tools=[{
        "type": "code_interpreter",
        "container": container.id
    }],
    tool_choice="required",
    input="Use the python tool to calculate the average of 12, 18, 24, and 30."
)

print(response.output_text)

Files in Code Interpreter

Code Interpreter can work with files during the life of the container. The model can read files you provide and can also create new files.

This is useful for workflows such as:

reading a CSV or PDF
generating a chart
saving processed output
creating logs or reports

File Persistence 🔗

Files created or changed by the python tool stay available in the same container as long as the container has not expired.

This means the model can build on earlier work in the same session. For example, it can:

read a file
analyze it
save a chart
use that chart later in the same container

When the container expires, that state is no longer available.

Uploading and Managing Files 🔗

You can manage container files through the container file APIs.

Common operations include:

Create Container File: Add a file to the container
List Container Files: View files in the container
Delete Container File: Remove a file
Retrieve Container File Content: Download a file from the container

This lets you use the container as a temporary workspace for model-driven code execution.

Output Files and Citations 🔗

When the model creates files, those files are stored in the container and can be referenced in the response.

These references include:

container_id
file_id
filename

You can use these values to retrieve the generated file content.

The OCI Responses API supports OpenAI-compatible endpoints for features such as Responses, Files, Containers, and Container Files. Because of that, the related OpenAI documentation can be used as a reference for request structure, response formats, and general workflows. However, when using these APIs with OCI, send requests to the OCI Generative AI inference endpoints, use OCI authentication, and note that the resources and execution remain in OCI Generative AI, not in an OpenAI tenancy.

Note

The OCI Responses API supports OpenAI-compatible endpoints for features such as Responses, Files, Containers, and Container Files. Because of that, the related OpenAI documentation can be used as a reference for request structure, response formats, and general workflows. However, when using these APIs with OCI, send requests to the OCI Generative AI inference endpoints, use OCI authentication, and the resources and execution remain in OCI Generative AI, not in an OpenAI tenancy.See Available Endpoints for the base URL and supported endpoints.

OpenAI Reference

NL2SQL Tool

The NL2SQL tool helps enterprise AI agents turn natural language into validated SQL. It is designed for querying federated enterprise data without moving or duplicating the underlying data.

NL2SQL maps business language to database fields, tables, and joins through a semantic enrichment layer. The tool generates SQL only. It does not run the query itself.

Query execution is handled separately by the DBTools MCP Server. That server calls the NL2SQL service to generate SQL, then authorizes and runs the query against the source database by using the end user’s identity and the appropriate guardrails.

To use NL2SQL, create an OCI Semantic Store resource. A semantic store is backed by a vector store with structured data and includes two DBTools connections:

Enrichment Connection
Query Connection

During setup, you select when enrichment runs:

On create: enrichment starts automatically after the semantic store is created
Manual: enrichment is triggered later through an API call

The enrichment process reads schema metadata such as tables and columns from the connected database. That metadata is then used to generate more accurate SQL.

After enrichment completes, you can call the GenerateSqlFromNl API to convert natural language input into SQL.

Prerequisites

Before using the NL2SQL tool, first create a database and configure the required database connections.

For more information, see:

Permissions for Semantic Stores

To use structured data for NL2SQL and schema-aware querying, set up the required IAM policies before creating the semantic store.

Access to Secrets

Grant the group access to read the secrets used by Database Tools:

allow group <your-group-name> 
to read secret-family in compartment <your-compartment-name> 
where all {request.principal.type='generativeaisemanticstore'}

Access to Database Tools Connections

Grant the group access to the required Database Tools resources:

allow group <your-group-name> 
to use database-tools-family in compartment <compartment-name> 
where all {request.principal.type='generativeaisemanticstore'}

allow group <your-group-name> 
to read database-family in compartment <compartment-name> 
where all {request.principal.type='generativeaisemanticstore'}

allow group <your-group-name> 
to read autonomous-database-family in compartment <compartment-name> 
where all {request.principal.type='generativeaisemanticstore'}

Access to Generative AI Resources

If the broader policy below is already in place, it includes access to semantic store resources:

allow group <your-group-name> 
to manage generative-ai-family in tenancy

If you want narrower access, use the following policy instead to allow the group to create and manage semantic stores:

allow group <your-group-name> 
to manage generative-ai-semantic-store 
in compartment <your-compartment-name>

If the group only needs to use an existing semantic store and call NL2SQL, use these more specific policies:

allow group <your-group-name> 
to use generative-ai-semantic-store 
in compartment <your-compartment-name>

allow group <your-group-name> 
to manage generative-ai-nl2sql 
in compartment <your-compartment-name>

Creating a Semantic Store

A semantic store is a vector store with structured data pointing to a database. This task documents the steps for creating a vector store with structured data by using the Oracle Cloud Console.

Tip

After you create a vector store, you can view its details and perform other tasks, such as updating it or deleting it. Use the Actions menu (three dots) in the Console to access these tasks.

To create a vector store, in the list page, select Create vector store. If you need help finding the list page, see Listing Vector Stores.

Basic Information

Enter a name for the vector store.
Start the name with a letter or underscore, followed by letters, numbers, hyphens, or underscores. The length can be 1 to 255 characters.
(Optional) Enter a description for the vector store.
Select a compartment to create the vector store in. The default compartment is the same as the list page, but you can select any compartment that you have permission to work in.

Data Source Type

Select Structured data. This option creates a semantic store for NL2SQL and schema-aware querying.

Structured Data

Under Configure sync connector, select a Connection type.

Allowed value: OCI Database tool.
Enter the Enrichment connection id, then select Test enrichment connection to verify access.
Enter the Querying connection id, then select Test query connection to verify access.
In Schemas, specify the database schema names to ingest for NL2SQL and schema-aware querying.

Semantic Store API

You can manage semantic stores by using the following API:

ChangeSemanticStoreCompartment
CreateSemanticStore
DeleteSemanticStore
GetSemanticStore
UpdateSemanticStore
ListSemanticStores

For enrichment job management, the following API are also available:

ListEnrichmentJobs
GetEnrichmentJob
GenerateEnrichmentJob
CancelEnrichmentJob

Example: Create and Manage a Semantic Store with Python

import json
import oci
from oci.base_client import BaseClient
from oci.retry import DEFAULT_RETRY_STRATEGY

API_VERSION = "20231130"
HOST = "https://dev.generativeai.us-ashburn-1.oci.oraclecloud.com"
BASE_PATH = f"/{API_VERSION}"


def get_signer_auth_api_key(profile="DEFAULT"):
    config = oci.config.from_file("~/.oci/config", profile)
    signer = oci.signer.Signer(
        tenancy=config["tenancy"],
        user=config["user"],
        fingerprint=config["fingerprint"],
        private_key_file_location=config["key_file"],
        pass_phrase=config.get("pass_phrase"),
    )
    return config, signer


def get_signer_auth_security_token(profile="DEFAULT"):
    config = oci.config.from_file("~/.oci/config", profile)
    signer = oci.auth.signers.SecurityTokenSigner(config)
    return config, signer


def make_base_client(signer):
    return BaseClient(
        service_endpoint=HOST,
        signer=signer,
        retry_strategy=None,
    )


def create_semantic_store(client, body: dict):
    return client.call_api(
        resource_path=f"{BASE_PATH}/semanticStores",
        method="POST",
        header_params={"content-type": "application/json"},
        body=body,
    )


def update_semantic_store(client, semantic_store_id: str, body: dict):
    return client.call_api(
        resource_path=f"{BASE_PATH}/semanticStores/{semantic_store_id}",
        method="PUT",
        header_params={"content-type": "application/json"},
        body=body,
        retry_strategy=DEFAULT_RETRY_STRATEGY,
    )


def get_semantic_store(client, semantic_store_id: str):
    return client.call_api(
        resource_path=f"{BASE_PATH}/semanticStores/{semantic_store_id}",
        method="GET",
        retry_strategy=DEFAULT_RETRY_STRATEGY,
    )


def delete_semantic_store(client, semantic_store_id: str):
    return client.call_api(
        resource_path=f"{BASE_PATH}/semanticStores/{semantic_store_id}",
        method="DELETE",
        retry_strategy=DEFAULT_RETRY_STRATEGY,
    )


if __name__ == "__main__":
    # Choose one authentication method
    # config, signer = get_signer_auth_api_key(profile="DEFAULT")
    config, signer = get_signer_auth_security_token(profile="DEFAULT")

    client = make_base_client(signer)

    create_body = {
        "displayName": "TestSemanticStore",
        "description": "Semantic store for the ADMIN schema",
        "freeformTags": {},
        "definedTags": {},
        "dataSource": {
            "queryingConnectionId": "ocid1.databasetoolsconnection.oc1.xxx",
            "enrichmentConnectionId": "ocid1.databasetoolsconnection.oc1.xxx",
            "connectionType": "DATABASE_TOOLS_CONNECTION",
        },
        "refreshSchedule": {"type": "ON_CREATE"},
        "compartmentId": "ocid1.tenancy.oc1..xxx",
        "schemas": {
            "connectionType": "DATABASE_TOOLS_CONNECTION",
            "schemas": [{"name": "ADMIN"}],
        },
    }

    create_resp = create_semantic_store(client, create_body)
    print("CREATE status:", create_resp.status)

    create_payload = create_resp.data
    if isinstance(create_payload, (bytes, str)):
        create_payload = json.loads(create_payload)

    print("CREATE response:", json.dumps(create_payload, indent=2))

    semantic_store_id = create_payload.get("id") or "<semantic-store-ocid>"

    update_body = {
        "refreshSchedule": {"type": "ON_CREATE"},
        "schemas": {
            "connectionType": "DATABASE_TOOLS_CONNECTION",
            "schemas": [{"name": "ADMIN"}],
        },
    }

    update_resp = update_semantic_store(client, semantic_store_id, update_body)
    print("UPDATE status:", update_resp.status)
    print("UPDATE response:", update_resp.data)

    get_resp = get_semantic_store(client, semantic_store_id)
    print("GET status:", get_resp.status)
    print("GET response:", get_resp.data)

    delete_resp = delete_semantic_store(client, semantic_store_id)
    print("DELETE status:", delete_resp.status)
    print("DELETE response:", delete_resp.data)

Example: Call the NL2SQL API

After the semantic store is ready and enrichment has completed, you can call the NL2SQL API to generate SQL from natural language.

import json
import oci
from oci.base_client import BaseClient

INFERENCE_BASE_URL = "https://inference.generativeai.<region>.oci.oraclecloud.com"
API_VERSION = "20260325"
SEMANTIC_STORE_ID = "ocid1.generativeaisemanticstore.oc1.xxx"

config = oci.config.from_file("~/.oci/config", "oc1")
signer = oci.auth.signers.SecurityTokenSigner(config)

client = BaseClient(
    service_endpoint=INFERENCE_BASE_URL,
    signer=signer,
    retry_strategy=None,
)

resource_path = (
    f"/{API_VERSION}/semanticStores/{SEMANTIC_STORE_ID}/actions/generateSqlFromNl"
)

body = {
    "displayName": "Generate SQL example",
    "description": "Generate SQL from natural language",
    "inputNaturalLanguageQuery": "Give me last week's order details."
}

resp = client.call_api(
    resource_path=resource_path,
    method="POST",
    header_params={"content-type": "application/json"},
    body=body,
)

print("HTTP status:", resp.status)
print("opc-request-id:", resp.headers.get("opc-request-id"))

data = resp.data
if isinstance(data, (bytes, str)):
    data = json.loads(data)

print(json.dumps(data, indent=2))

Multi-Turn Conversations

OCI Generative AI supports multi-turn interactions, so you can build applications that keep context across user turns.

There are two common ways to do this:

Responses chaining
Conversations API

Responses Chaining

With responses chaining, each new response points to the previous one. This is a simple option when you want to carry context forward without explicitly creating a conversation resource.

# first turn
response1 = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Give me three ideas for a team offsite.",
)
print("Response 1:", response1.output_text)

# second turn
response2 = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Make the second idea more budget friendly.",
    previous_response_id=response1.id,
)
print("Response 2:", response2.output_text)

Conversations API

With the Conversations API, you create a conversation first and then attach responses to that conversation. This is useful when you want a dedicated conversation object that can be reused across turns.

# create a conversation
conversation = client.conversations.create(
    metadata={"topic": "demo"}
)
print("Conversation ID:", conversation.id)

# first turn
response1 = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Give me three ideas for a team offsite.",
    conversation=conversation.id,
)
print("Response 1:", response1.output_text)

# second turn
response2 = client.responses.create(
    model="openai.gpt-oss-120b",
    input="Make the second idea more budget friendly.",
    conversation=conversation.id,
)
print("Response 2:", response2.output_text)

Oracle Cloud Infrastructure Documentation

QuickStart

Prerequisite: Set Up IAM Permissions

Grant Access at the Tenancy Level

Grant Access at the Compartment Level

1. Create a Project

Basic Information

Data Retention

Short-Term Memory Compaction

Long-Term Memory

Tags

2. Create an API Key

Basic information

Key names and expiration times

3. Add Permission to the API Key

Find the API Key OCID

Grant Permission to the API Key

4. Call the OCI Responses API

Base URL 🔗

SDK Support 🔗

Install the Official OpenAI SDK (Python)

Make Your First Request 🔗

Understanding OCI Responses API Endpoints

Available Endpoints

OCI IAM Authentication

When to Use IAM Authentication

Install the OCI IAM Auth Library

Configure the OpenAI Client

Example: Local Development (OciSessionAuth)

Example: OCI Managed Environments (OciResourcePrincipalAuth)

Ensure Required Permissions

If Using OCI IAM Authentication

If Using Generative AI API Key Authentication

Enable Debug Log

Call Models

Third-Party Hosted Models

On-Demand Models

Dedicated AI Clusters (Dedicated Mode)

Stream Responses

Stream All Events

Stream Only Text Output (Delta Tokens)

Structured Output

Trace API Calls

Integrate with Observability Tools 🔗

Step 1: Install the Langfuse SDK

Step 2: Configure Environment Variables

Step 3: Instrument the OpenAI Client

Multimodal Inputs

Image Input as Base64-Encoded Data URL

Image Input as Internet-Accessible URL

File Input as File ID

File Input as Internet-Accessible URL

Reasoning

Reasoning Effort

Reasoning Summary Output

Function Tools

What This Enables

Execution Flow 🔗

State Handling Options 🔗

Define a Function Tool

Example: Service-Managed State

Example: Service-Managed State

MCP Tool

When to Use the MCP Tool

Transport Support

Define an MCP Tool

Restrict the Tools Exposed by the MCP Server

Provide Authentication to the MCP Server

MCP Server Hosting

File Search Tool

Prepare a Vector Store

Example

Code Interpreter

What You Can Use It For

Execution Environment

Container Memory Limits

Container Expiration

Example

Containers for Code Interpreter

Auto Mode