Build Conversational AI Applications with Vision and Voice in Under 30 Minutes

artificial intelligence

Discover how Orga AI simplifies the creation of real-time conversational AI applications that can see, hear, and talk. Learn to integrate vision, speech-to-text, language models, and text-to-speech with its unified SDKs, getting your app running in minutes.

Building AI applications that can process video feeds, understand user speech, and respond naturally typically involves integrating multiple complex services: a speech-to-text API, a vision model, a language model, a text-to-speech service, WebRTC for real-time streaming, and WebSockets for low-latency communication. Orchestrating these components, managing synchronization, and ensuring minimal latency (ideally under two seconds) can be a significant challenge.

This is precisely the problem Orga AI solves.

Orga AI offers unified SDKs and a seamless API flow, integrating vision, voice, and conversational AI processing in under 700 milliseconds. With out-of-the-box support for over 40 languages, you can get a powerful, real-time conversational AI running in under 30 minutes.

What Is Orga AI?

Orga AI is a real-time conversational AI platform designed to enable natural interaction. Users can turn on their camera, speak naturally, and show the AI what's happening. The AI then watches, listens, and responds with its voice in their preferred language.

Consider this example:

User (pointing their phone camera): "My smart hub won’t connect. The light keeps blinking orange."

Orga AI (watching the blinking pattern): "I see that orange blink — your hub lost its network settings. Show me the back and I’ll walk you through a reset."

This eliminates the need for typing, screenshots, or detailed verbal descriptions. The AI directly perceives the problem and guides the user to a solution, much like a human colleague would. This seamless experience is what you can deliver with Orga.

Getting Started with the Orga SDKs

Orga provides a comprehensive suite of SDKs for both client and server-side integration, designed to connect your application to its powerful APIs. The Orga client SDKs seamlessly integrate with any React-based framework. Choose your preferred stack and follow these steps.

Next.js (Fastest Setup)

Our Next.js starter template provides a complete application scaffold with video, audio, and AI conversation pre-configured.

npx @orga-ai/create-orga-next-app my-app
cd my-app
npm install
npm run dev

Open localhost:3000 to see a working demo with a camera preview, voice input, and AI responses. From there, you can customize the AI's personality, integrate your specific logic, and deploy your application.

Backend Proxy (Node.js)

To securely manage your API key, it's recommended to create a small backend service. This service acts as a proxy between your client application and the Orga API, fetching ICE servers and an ephemeral token. Your client SDK (React or React Native) will call this endpoint before establishing its connection.

import 'dotenv/config';
import express from 'express';
import cors from 'cors';
import { OrgaAI } from '@orga-ai/node';

const app = express();
app.use(cors());

const orga = new OrgaAI({
  apiKey: process.env.ORGA_API_KEY!
});

app.get('/api/orga-client-secrets', async (_req, res) => {
  try {
    const { ephemeralToken, iceServers } = await orga.getSessionConfig();
    res.json({ ephemeralToken, iceServers });
  } catch (error) {
    console.error('Failed to get session config:', error);
    res.status(500).json({ error: 'Internal server error' });
  }
});

app.listen(5000, () => console.log('Proxy running on http://localhost:5000'));

Once this backend endpoint is operational, your frontend SDK can call it to retrieve the necessary session configuration without exposing your API key directly in the client.

React (Vite, Create React App, or Custom Setup)

If you already have a React project, integrating the Orga AI SDK is straightforward.

npm install @orga-ai/react

When setting up the provider, you'll specify the backend endpoint your client SDK should call. This endpoint securely provides the ephemeral token and ICE servers, which the SDK then uses to establish a connection to Orga without exposing your API key.

'use client'
import { OrgaAI, OrgaAIProvider } from '@orga-ai/react';

OrgaAI.init({
  logLevel: 'debug',
  model: 'orga-1-beta',
  voice: 'alloy',
  fetchSessionConfig: async () => {
    const res = await fetch('http://localhost:5000/api/orga-client-secrets');
    if (!res.ok) throw new Error('Failed to fetch session config');
    const { ephemeralToken, iceServers } = await res.json();
    return { ephemeralToken, iceServers };
  },
});

export function OrgaClientProvider({ children }: { children: React.ReactNode }) {
  return <OrgaAIProvider>{children}</OrgaAIProvider>;
}

Next, wrap your application with the OrgaClientProvider:

import type { ReactNode } from 'react';
import { OrgaClientProvider } from './providers/OrgaClientProvider';

export default function RootLayout({ children }: { children: ReactNode }) {
  return (
    <html lang="en">
      <body>
        <OrgaClientProvider>
          {children}
        </OrgaClientProvider>
      </body>
    </html>
  );
}

You are now ready to create your main component and import the useOrgaAI hook. This hook provides all necessary functionalities, including startSession(), endSession(), and real-time state variables like connectionState.

'use client'
import {
  useOrgaAI,
  OrgaVideo,
  OrgaAudio,
} from '@orga-ai/react';

export default function Home() {
  const {
    startSession,
    endSession,
    connectionState,
    toggleCamera,
    toggleMic,
    isCameraOn,
    isMicOn,
    userVideoStream,
    aiAudioStream,
  } = useOrgaAI();

  const isConnected = connectionState === 'connected';
  const isIdle = connectionState === 'disconnected';

  return (
    <main className="mx-auto flex max-w-2xl flex-col gap-6 p-8">
      <header>
        <h1 className="text-3xl font-bold">Orga React SDK Quick Start</h1>
        <p className="text-gray-600">Status: {connectionState}</p>
      </header>
      <section className="grid grid-cols-2 gap-4">
        <button
          className="rounded bg-blue-600 px-4 py-2 text-white disabled:opacity-50"
          disabled={!isIdle}
          onClick={() => startSession()}
        >
          Start Session
        </button>
        <button
          className="rounded bg-red-600 px-4 py-2 text-white disabled:opacity-50"
          disabled={!isConnected}
          onClick={() => endSession()}
        >
          End Session
        </button>
        <button
          className="rounded border px-4 py-2 disabled:opacity-50"
          disabled={!isConnected}
          onClick={toggleCamera}
        >
          {isCameraOn ? 'Camera On' : 'Camera Off'}
        </button>
        <button
          className="rounded border px-4 py-2 disabled:opacity-50"
          disabled={!isConnected}
          onClick={toggleMic}
        >
          {isMicOn ? 'Mic On' : 'Mic Off'}
        </button>
      </section>
      <OrgaVideo stream={userVideoStream} className="h-64 w-full rounded bg-black" />
      <OrgaAudio stream={aiAudioStream} />
    </main>
  );
}

React Native (Expo)

For mobile development, Orga AI offers the same powerful API with native performance. The setup is nearly identical to the React web example, with only minor differences in SDK import and initialization.

First, install the necessary dependencies:

npm install @orga-ai/react-native react-native-webrtc react-native-incall-manager

For Expo projects, remember to update app.json to request camera and microphone permissions:

{
  "expo": {
    "ios": {
      "infoPlist": {
        "NSCameraUsageDescription": "Allow $(PRODUCT_NAME) to access your camera",
        "NSMicrophoneUsageDescription": "Allow $(PRODUCT_NAME) to access your microphone"
      }
    },
    "android": {
      "permissions": [
        "android.permission.CAMERA",
        "android.permission.RECORD_AUDIO"
      ]
    }
  }
}

Similar to the React web setup, define your provider to tell the client SDK which backend endpoint to call for the ephemeral token and ICE servers.

import { Stack } from 'expo-router';
import { OrgaAI, OrgaAIProvider } from '@orga-ai/react-native';

OrgaAI.init({
  logLevel: 'debug',
  model: 'orga-1-beta',
  voice: 'alloy',
  fetchSessionConfig: async () => {
    const response = await fetch('http://localhost:5000/api/orga-client-secrets');
    if (!response.ok) throw new Error('Failed to fetch session config');
    const { ephemeralToken, iceServers } = await response.json();
    return { ephemeralToken, iceServers };
  },
});

export default function RootLayout() {
  return (
    <OrgaAIProvider>
      <Stack />
    </OrgaAIProvider>
  );
}

Finally, create your main file and import the useOrgaAI hook, which exposes startSession(), endSession(), connectionState, and other real-time interaction controls.

import { StyleSheet, View } from 'react-native';
import {
  OrgaAICameraView,
  OrgaAIControls,
  useOrgaAI,
} from '@orga-ai/react-native';

export default function HomeScreen() {
  const {
    connectionState,
    isCameraOn,
    isMicOn,
    userVideoStream,
    startSession,
    endSession,
    toggleCamera,
    toggleMic,
    flipCamera,
  } = useOrgaAI();

  return (
    <View style={styles.container}>
      <OrgaAICameraView
        streamURL={userVideoStream?.toURL()}
        containerStyle={styles.cameraContainer}
        style={{
          width: '100%',
          height: '100%'
        }}
      >
        <OrgaAIControls
          connectionState={connectionState}
          isCameraOn={isCameraOn}
          isMicOn={isMicOn}
          onStartSession={startSession}
          onEndSession={endSession}
          onToggleCamera={toggleCamera}
          onToggleMic={toggleMic}
          onFlipCamera={flipCamera}
        />
      </OrgaAICameraView>
    </View>
  );
}

const styles = StyleSheet.create({
  container: {
    flex: 1,
    backgroundColor: '#0f172a'
  },
  cameraContainer: {
    width: '100%',
    height: '100%'
  },
});

It's important to note that unlike React for the web, building with React Native and Expo involves mobile-specific considerations. WebRTC and audio features require native access to the device’s camera and microphone, necessitating proper permission configuration in your app.json. Additionally, because Orga relies on native modules, the SDK will not run within Expo Go; you will need to use a dev client. When developing locally, ensure your backend endpoint is accessible from your mobile device, either via your local network IP (e.g., http://192.168.x.x:5000) or a tunneling tool like ngrok for testing.

For more comprehensive setup guides, troubleshooting tips, and in-depth examples for React, React Native, and Node.js, refer to the full Orga AI documentation.

What Can You Build?

Orga AI delivers a natural, human-like voice and timing, making interactions feel intuitive and trustworthy. Users often forget they are talking to an AI, allowing for more natural communication and confident reliance on its responses. Here are a few examples of what you can create:

  1. Customer Support: A user points their phone at a router. The AI quickly identifies a red light on port 3 and advises: "That port has a hardware fault. Try a different cable." The AI observes and diagnoses, eliminating the need for verbose descriptions.
  2. Accessibility: A visually impaired user holds their phone in a coffee shop. "What's on the menu?" The AI reads the menu board aloud. Later, they ask, "Is my coffee ready?" The AI spots their name on a cup at the counter.
  3. Field Service: A technician points their phone at an unfamiliar control panel. "Where's the reset switch?" The AI locates it behind a small cover on the left and provides hands-free, step-by-step guidance for the reset procedure.

If your application requires users to both show something and discuss it, Orga AI is designed for that very purpose.

Make It Yours with a System Prompt

Every Orga AI agent begins with a system prompt—simple text that dictates the AI's behavior and personality. Whether you need a friendly support representative, a strict safety checker, or a patient tutor, you can define it precisely:

You are a technician for HomeHelp.
Speak calmly. When users show a broken device, identify the model first, then guide them through the fix.
If you see exposed wires or water damage, tell them to call a professional.

Start Building Today

Stop spending time wiring services together; start building your product instead. Create a free account at platform.orga-ai.com to build and test a full prototype.

Here’s how to get started:

What will you create?