Skip to main content

@robojs/ai

Transform your Robo into an AI-powered copilot with native voice, vision, web search, natural language command execution, token usage tracking, and a pluggable engine architecture. Ships with OpenAI's Responses + Conversations APIs by default while staying ready for custom providers.

GitHub licensenpminstall sizeDiscordAll Contributors

➞ πŸ“š Documentation: Getting started

➞ πŸš€ Community: Join our Discord server

Installation​

Terminal
npx robo add @robojs/ai

New project? Scaffold with the plugin already wired up:

Terminal
npx create-robo my-robo -p @robojs/ai

πŸ’‘ Voice dependencies are optional. Install @discordjs/voice, prism-media, and an Opus encoder such as opusscript or @discordjs/opus if you plan to use voice conversations. The plugin loads them lazily when a voice session starts.

[!IMPORTANT] Set OPENAI_API_KEY (or your provider's equivalent) in the environment so the default OpenAI engine can authenticate.

Interacting with Your AI Robo​

  • Mentions: Ping the bot anywhere. Example: @Sage what's the schedule for tonight?
  • Replies: Continue threads by replying to any bot messageβ€”no extra mention required.
  • Whitelisted Channels: Configure channels for ambient chatting without mentions.
  • Direct Messages: DM the bot and it will respond automatically.
// config/plugins/robojs/ai.mjs
export default {
whitelist: {
channelIds: ['123456789012345678', '234567890123456789']
},
restrict: {
channelIds: ['345678901234567890']
}
}

⚠️ Channel restrictions always win: if restrict is provided, the bot ignores messages from channels not listedβ€”even if they are in the whitelist.

Seed Commands​

During npx robo add @robojs/ai you can optionally scaffold slash commands. They are yours to keep, tweak, or delete.

  • /ai chat (seed/commands/ai/chat.ts)
    • Quick, prompt-based conversations without mentioning the bot.
    • Example: /ai chat message:"What's the capital of France?"
  • /ai imagine (seed/commands/ai/imagine.ts)
    • Text-to-image generation powered by your configured engine.
    • Example: /ai imagine prompt:"A serene mountain landscape at sunset"
  • /ai usage (seed/commands/ai/usage.ts)
    • Token usage dashboard (requires Manage Server). Displays daily, weekly, monthly, and lifetime totals plus configured caps.
  • /voice join (seed/commands/voice/join.ts)
    • Summons the bot into a voice channel for realtime conversation.
    • Example: /voice join channel:"General"

πŸ’‘ Seed scripts only bootstrap starter flows. The bot continues to answer mentions, replies, DMs, and whitelisted channels even if you remove them.

️ Vision Capabilities​

Vision-enabled models (e.g., gpt-4o, gpt-4.1, gpt-5, o1, o3) automatically understand images.

  • Discord attachments are converted to image URLs and forwarded to the engine.
  • Message content URLs with images are processed as additional inputs.
  • No extra config requiredβ€”choose a vision-capable model and you're done.

Example conversation:

User: @Sage can you describe this?
[uploads photo of a coffee setup]
Bot: That looks like a pour-over coffee station with a glass carafe and ceramic dripper. The beans are medium roast...

Programmatic vision call:

import { AI } from '@robojs/ai'

await AI.chat(
[
{
role: 'user',
content: [
{ type: 'text', text: 'What do you see in this image?' },
{
type: 'image_url',
image_url: { url: 'https://example.com/desk-setup.jpg' }
}
]
}
],
{
onReply: ({ text }) => console.log(text)
}
)

[!NOTE] The engine detects capability via isVisionCapableModel() and falls back gracefully for models that cannot process images.

Web Search & Citations​

Enable live web context with built-in citation formatting when using the OpenAI engine.

import { OpenAiEngine } from '@robojs/ai/engines/openai'

export default {
engine: new OpenAiEngine({
webSearch: true
})
}

When active, responses embed inline markers like [1] and append a Sources: footer with clickable links.

The latest LTS release of Node.js is 20.x [1], which introduces performance
improvements to the web streams API [2].

Sources:
[1] Node.js – https://nodejs.org
[2] Release Notes – https://nodejs.org/en/blog/release

πŸ’‘ Citations are injected via injectCitationMarkers() and rendered with formatSourcesLine()β€”no extra formatting needed on your end.

Features​

  • πŸ’¬ Natural Discord conversations with mentions, replies, DMs, and channel whitelists.
  • πŸŽ™οΈ Native voice conversations with realtime transcription and playback.
  • πŸ‘οΈ Vision-aware chats that understand images and URLs.
  • πŸ” Optional web search with automatic citations.
  • πŸ› οΈ Natural language command execution mapped to your slash commands.
  • πŸ“Š Token usage tracking with configurable rate limits.
  • πŸ“š Knowledge sync via vector stores sourced from /documents.
  • πŸ–ΌοΈ Image generation for creative prompts.
  • πŸ”Œ Extensible engine API to plug in other model providers.
  • 🌐 REST endpoint (/api/ai/chat) for web integrations (requires @robojs/server).

Configuration​

Basic Plugin Options​

PluginOptions live in config/plugins/robojs/ai.(mjs|ts):

  • instructions – System prompt or persona description.
  • commands – true, false, or string array allow list for natural language command execution.
  • insight – Enable /documents sync for vector search (default true).
  • restrict – Limit responses to specific channel IDs.
  • whitelist – Allow mention-free chat in selected channels.
  • context – Surrounding context configuration for understanding ongoing conversations (default { enabled: true, depth: 8 }).
  • engine – Instance of any BaseEngine subclass.
  • voice – Voice configuration overrides.
  • usage – Token ledger settings (limits, alerts, hooks).
import { OpenAiEngine } from '@robojs/ai/engines/openai'

export default {
instructions: 'You are Sage, a friendly community mentor who answers succinctly.',
commands: ['ban', 'kick', 'ai log'],
insight: true,
restrict: { channelIds: ['123'] },
whitelist: { channelIds: ['456', '789'] },
context: {
enabled: true,
depth: 8
},
usage: {
limits: [
{
window: 'day',
mode: 'alert',
maxTokens: 200_000
}
]
},
engine: new OpenAiEngine({
chat: {
model: 'gpt-4.1-mini',
temperature: 0.6,
maxOutputTokens: 800
}
})
}

[!NOTE] All engines follow the same BaseEngine contract, so swapping providers is as simple as returning a different engine instance.

Engine Configuration​

The default OpenAiEngine accepts these options:

  • clientOptions – Pass through to the OpenAI SDK (apiKey, baseURL, fetch overrides).
  • chat – Default chat model config (model, temperature, maxOutputTokens, reasoningEffort).
  • voice – Defaults for realtime or TTS models plus transcription settings.
  • webSearch – Enable/disable web search tool.
  • mcp – MCP error handling configuration (see MCP section below).
new OpenAiEngine({
clientOptions: {
apiKey: process.env.OPENAI_API_KEY,
organization: process.env.OPENAI_ORG_ID
},
chat: {
model: 'gpt-5-preview',
reasoningEffort: 'medium',
temperature: 0.2,
maxOutputTokens: 1200
},
voice: {
model: 'gpt-4o-realtime-preview',
transcription: {
model: 'gpt-4o-transcribe-latest',
language: 'en'
}
},
webSearch: true
})

πŸ’‘ Use reasoning models (o1, o3, gpt-5) for complex planning; standard models (gpt-4o, gpt-4.1-mini) keep responses fast and vision-ready.

[!NOTE] Image generation is available via AI.generateImage(options); engine-level defaults for images are not currently configurable through the constructor.

MCP (Model Context Protocol) Support​

MCP tools are server-side proxied by OpenAIβ€”the engine passes MCP configs in the tools array, OpenAI executes them remotely, and results are incorporated into responses automatically.

Configuration​

export default {
mcpServers: [
{
type: 'mcp',
server_label: 'context7',
server_url: 'https://mcp.context7.com/mcp',
headers: { CONTEXT7_API_KEY: process.env.CONTEXT7_API_KEY ?? '' },
allowed_tools: ['resolve-library-id', 'get-library-docs'],
require_approval: 'never'
}
],
mcp: {
gracefulDegradation: true, // Default: true - removes MCPs on persistent failures
extraRetries: 1, // Default: 1 - extra retry attempts before degrading
baseDelayMs: 500, // Default: 500 - base delay for exponential backoff
maxDelayMs: 2000 // Default: 2000 - maximum delay for exponential backoff
}
}

Error Handling & Graceful Degradation​

When MCP tools are present and the Responses API call fails with retryable network errors (after the OpenAI SDK's built-in retries), the engine:

  1. Performs extra retries with exponential backoff (configurable via extraRetries, baseDelayMs, maxDelayMs)
  2. Gracefully degrades by removing MCP tools and retrying once more (if gracefulDegradation is enabled, default: true)
  3. Informs the AI that certain external tools were unavailable, ensuring transparent communication to users

This prevents entire requests from failing due to transient MCP server issues while maintaining user awareness of tool unavailability.

[!NOTE] MCP tools are only available in 'worker' context (standard chat), not in 'realtime' (voice) contexts.

️ Voice Features​

Voice support is built-inβ€”no extra plugin required.

  • enabled – Toggle voice globally (defaults to true).
  • endpointing – Choose 'server-vad' (automatic voice activity detection) or 'manual' (explicit end-of-turn control).
  • model – Override the realtime model.
  • playbackVoice – Select the TTS voice ID.
  • maxConcurrentChannels – Limit simultaneous guild sessions.
  • instructions – Voice-specific system prompt.
  • capture – Audio capture config (channels, sampleRate, silenceDurationMs, vadThreshold).
  • playback – Output sampling configuration.
  • transcript – Embed transcripts (guild channel ID, enable flag).
  • perGuild – Override any of the above per guild.
export default {
voice: {
endpointing: 'server-vad',
maxConcurrentChannels: 2,
playbackVoice: 'alloy',
instructions: 'Keep spoken replies upbeat and under 8 seconds.',
capture: {
sampleRate: 48000,
silenceDurationMs: 600,
vadThreshold: 0.35
},
transcript: {
enabled: true,
targetChannelId: '987654321098765432'
},
perGuild: {
'123456789012345678': {
playbackVoice: 'verse'
}
}
}
}

Voice lifecycle events (session:start, session:stop, config:change, transcript:segment) are available via AI.onVoiceEvent. Transcript embeds require CONNECT, SPEAK, and SEND_MESSAGES permissions.

[!TIP] server-vad is recommendedβ€”it automatically trims silence and hands conversations back to the model without manual end markers.

Token Usage & Limits​

Token accounting is handled by the tokenLedger singleton.

  • Records usage for every AI call (chat, voice, image, custom tools).
  • Groups stats by window: day, week, month, lifetime.
  • Enforces limits in block mode (throws) or alert mode (emits events).
  • Tracks per-model totals, ideal for budgeting across models/providers.
export default {
usage: {
limits: [
{
window: 'month',
mode: 'block',
maxTokens: 1_000_000,
models: ['gpt-4o', 'gpt-4o-mini']
},
{
window: 'day',
mode: 'alert',
maxTokens: 200_000
}
]
}
}
import { AI, tokenLedger, TokenLimitError } from '@robojs/ai'

try {
await AI.chatSync(
[{ role: 'user', content: 'Summarize the meeting notes.' }],
{}
)
} catch (error) {
if (error instanceof TokenLimitError) {
// Notify admins or throttle usage
}
}

const weekly = await AI.getUsageSummary({ window: 'week' })
const lifetime = tokenLedger.getLifetimeTotals()

Usage events:

tokenLedger.on('usage.recorded', payload => {
console.log('Usage recorded:', payload)
})

tokenLedger.on('usage.limitReached', payload => {
// Alert your team via webhook/DM
})

⚠️ Exceeding a block mode limit throws TokenLimitError; the /ai usage command helps server admins monitor usage directly inside Discord.

JavaScript API Reference​

AI Singleton​

  • AI.chat(messages: ChatMessage[], options: ChatOptions) – Streaming conversation helper acknowledging Discord context. messages is the array of chat turns (system, user, assistant). options extends engine chat settings with Discord context and requires onReply.
  • AI.chatSync(messages: ChatMessage[], options: Omit<ChatOptions, 'onReply'>) – Promise that resolves with the first assistant reply, suitable for HTTP handlers or scripts.
  • AI.generateImage(options) – Image creation via configured engine.
  • AI.isReady() – Checks engine initialization state.
  • AI.getActiveTasks(channelId?) – Returns active background task snapshots.

Voice Controls​

  • AI.startVoice({ guildId, channelId, textChannelId? })
  • AI.stopVoice({ guildId, channelId? })
  • AI.setVoiceConfig({ patch, guildId? })
  • AI.getVoiceStatus(guildId?)
  • AI.getVoiceMetrics()
  • AI.onVoiceEvent(event, listener) / AI.offVoiceEvent(event, listener)

Usage Tracking​

  • AI.getUsageSummary(query) – Aggregated usage per window.
  • AI.getLifetimeUsage(model?) – Lifetime totals.
  • AI.onUsageEvent(event, listener) / AI.onceUsageEvent(event, listener) / AI.offUsageEvent(event, listener).

Channel Management​

  • AI.addWhitelistChannel(channelId) – Adds a channel to the whitelist at runtime.
  • AI.removeWhitelistChannel(channelId) – Removes a channel from the whitelist.
  • AI.addRestrictChannel(channelId) – Adds a channel to the restrict list at runtime.
  • AI.removeRestrictChannel(channelId) – Removes a channel from the restrict list.
  • AI.getWhitelistChannels() – Returns array of whitelisted channel IDs.
  • AI.getRestrictChannels() – Returns array of restricted channel IDs.
import { AI } from '@robojs/ai'

// Dynamically whitelist a channel based on an event
AI.addWhitelistChannel('123456789012345678')

// Check current whitelist
const whitelisted = AI.getWhitelistChannels()
console.log('Whitelisted channels:', whitelisted)

// Remove from whitelist when no longer needed
AI.removeWhitelistChannel('123456789012345678')

[!NOTE] Runtime changes are not persisted and will be lost on restart. The config file takes precedence.

Token Ledger Direct Access​

  • tokenLedger.recordUsage(entry)
  • tokenLedger.getSummary(query)
  • tokenLedger.getLifetimeTotals(model?)
  • tokenLedger.configure(config)
  • tokenLedger.getLimits() / tokenLedger.setLimits(limits)
  • tokenLedger.getLimitState(model?)

️ Command Execution​

Natural language commands are routed to your registered slash commands via tool calls.

  • Permission checks mimic Discord's native rules (roles, DM ban lists).
  • Sage deferral kicks in after deferBuffer (default 3s) to avoid interaction timeouts.
  • Background jobs show typing indicators and follow-up messages until completion.
// Example snippet from a custom tool handler
AI.chat(messages, {
channel,
member,
onReply: reply => {
if (reply.type === 'command-executed') {
console.log('Command dispatched:', reply.command)
}
}
})

[!IMPORTANT] Keep your slash command permissions in sync. The AI respects them but cannot override Discord's permission model.

Insights (Vector Store)​

  • Drop files into /documents and Robo syncs them to a vector store on startup.
  • Supports all file types listed in OpenAI file search docs.
  • Hash-based diffing uploads only changed files.
  • Disable by setting insight: false in the plugin config.
export default {
insight: false
}

πŸ’‘ Remind the bot to reference key documents by mentioning them in instructions for higher recall accuracy.

Web API​

Pair with @robojs/server to expose REST endpoints.

POST /api/ai/chat
  • Request body: { messages: ChatMessage[] }
  • Response body: { message: string }
// src/api/ai/chat.ts
import { AI } from '@robojs/ai'
import type { ChatMessage } from '@robojs/ai'
import type { RoboRequest } from '@robojs/server'

export default async function handler(req: RoboRequest) {
const { messages } = await req.json<{ messages: ChatMessage[] }>()

if (!messages?.length) {
return { message: '' }
}

const reply = await AI.chatSync(messages, {})

return { message: reply.text ?? '' }
}

[!NOTE] Install @robojs/server via npx robo add @robojs/server to enable the route.

Hooks​

Intercept and modify the AI pipeline with global hooks.

  • chat – Pre-process messages before they reach the engine.
  • reply – Post-process the final response, access MCP tool usage, and override the output.
// config/plugins/robojs/ai.ts
import type { ReplyHookContext, ChatReply } from '@robojs/ai'

export default {
hooks: {
reply: (context: ReplyHookContext): ChatReply | void => {
const { response, mcpCalls, degradedMcpServers } = context

// Log MCP tool usage
if (mcpCalls?.length) {
console.log('MCP Tools used:', mcpCalls.map(c => c.name))
}

// Handle MCP degradation notifications
if (degradedMcpServers && degradedMcpServers.length > 0) {
console.warn('MCP servers were degraded:', degradedMcpServers)
// You can apply post-processing, send alerts, etc.
}

// Modify the response if needed
if (response.mcpCalls?.some(call => call.serverLabel === 'my-secure-server')) {
return {
text: response.message?.content + '\n\nπŸ”’ *Verified Secure Response*',
// You can also add components/embeds here
}
}
}
}
}

Custom Engine Development​

Create your own engine by extending the abstract base class.

import { BaseEngine } from '@robojs/ai'
import type {
ChatOptions,
ChatResult,
VoiceSessionHandle,
VoiceSessionStartOptions
} from '@robojs/ai/engines/base'

class MyEngine extends BaseEngine {
async init() {
// Warm up connections, perform auth
}

async chat(options: ChatOptions): Promise<ChatResult> {
// Call your provider and return the assistant response
return {
messages: [
{
role: 'assistant',
content: 'Hello from a custom provider!'
}
]
}
}

supportedFeatures() {
return {
voice: false,
image: true,
webSearch: false
}
}
}

export default new MyEngine()

Mandatory methods: init(), chat(), supportedFeatures(). Optional overrides include generateImage(), startVoiceSession(), stopVoiceSession(), and summarizeToolResult(). Engines emit hooks via engine.on(event, listener) if you want to observe chat flow or tool usage.

[!TIP] The engine lifecycle is production readyβ€”use it to integrate Azure OpenAI, Anthropic, Llama, or on-prem solutions.

️ Troubleshooting​

  • Missing voice audio – Confirm optional voice dependencies are installed and the bot has CONNECT + SPEAK permissions.
  • Bot ignores channels – Check restrict and whitelist configuration; restrict takes precedence.
  • TokenLimitError thrown – You exceeded a block mode limit. Lower usage or switch to alert.
  • High latency – Large insight datasets can slow responses; prune /documents or increase engine model speed (e.g., gpt-4.1-mini).
  • Voice quality issues – Tune capture.sampleRate and endpointing strategy. server-vad smooths out long silences.
  • Transcript embeds missing – Ensure the transcript channel exists and the bot can send messages there.

[!HELP] Still stuck? Enable debug logging via ROBO_LOG_LEVEL=debug and watch for detailed engine traces.

Got questions?​

If you need help, hop into our community Discord. We love to chat, and Sage (our resident AI Robo) is always online to assist.

➞ πŸš€ Community: Join our Discord server

More on GitHub​