@robojs/ai
Transform your Robo into an AI-powered copilot with native voice, vision, web search, natural language command execution, token usage tracking, and a pluggable engine architecture. Ships with OpenAI's Responses + Conversations APIs by default while staying ready for custom providers.
β π Documentation: Getting started
β π Community: Join our Discord server
Installationβ
npx robo add @robojs/aiNew project? Scaffold with the plugin already wired up:
npx create-robo my-robo -p @robojs/aiπ‘ Voice dependencies are optional. Install
@discordjs/voice,prism-media, and an Opus encoder such asopusscriptor@discordjs/opusif you plan to use voice conversations. The plugin loads them lazily when a voice session starts.
[!IMPORTANT] Set
OPENAI_API_KEY(or your provider's equivalent) in the environment so the default OpenAI engine can authenticate.
Interacting with Your AI Roboβ
- Mentions: Ping the bot anywhere. Example:
@Sage what's the schedule for tonight? - Replies: Continue threads by replying to any bot messageβno extra mention required.
- Whitelisted Channels: Configure channels for ambient chatting without mentions.
- Direct Messages: DM the bot and it will respond automatically.
// config/plugins/robojs/ai.mjs
export default {
whitelist: {
channelIds: ['123456789012345678', '234567890123456789']
},
restrict: {
channelIds: ['345678901234567890']
}
}
β οΈ Channel restrictions always win: if
restrictis provided, the bot ignores messages from channels not listedβeven if they are in the whitelist.
Seed Commandsβ
During npx robo add @robojs/ai you can optionally scaffold slash commands. They are yours to keep, tweak, or delete.
/ai chat(seed/commands/ai/chat.ts)- Quick, prompt-based conversations without mentioning the bot.
- Example:
/ai chat message:"What's the capital of France?"
/ai imagine(seed/commands/ai/imagine.ts)- Text-to-image generation powered by your configured engine.
- Example:
/ai imagine prompt:"A serene mountain landscape at sunset"
/ai usage(seed/commands/ai/usage.ts)- Token usage dashboard (requires Manage Server). Displays daily, weekly, monthly, and lifetime totals plus configured caps.
/voice join(seed/commands/voice/join.ts)- Summons the bot into a voice channel for realtime conversation.
- Example:
/voice join channel:"General"
π‘ Seed scripts only bootstrap starter flows. The bot continues to answer mentions, replies, DMs, and whitelisted channels even if you remove them.
οΈ Vision Capabilitiesβ
Vision-enabled models (e.g., gpt-4o, gpt-4.1, gpt-5, o1, o3) automatically understand images.
- Discord attachments are converted to image URLs and forwarded to the engine.
- Message content URLs with images are processed as additional inputs.
- No extra config requiredβchoose a vision-capable model and you're done.
Example conversation:
User: @Sage can you describe this?
[uploads photo of a coffee setup]
Bot: That looks like a pour-over coffee station with a glass carafe and ceramic dripper. The beans are medium roast...
Programmatic vision call:
import { AI } from '@robojs/ai'
await AI.chat(
[
{
role: 'user',
content: [
{ type: 'text', text: 'What do you see in this image?' },
{
type: 'image_url',
image_url: { url: 'https://example.com/desk-setup.jpg' }
}
]
}
],
{
onReply: ({ text }) => console.log(text)
}
)
[!NOTE] The engine detects capability via
isVisionCapableModel()and falls back gracefully for models that cannot process images.
Web Search & Citationsβ
Enable live web context with built-in citation formatting when using the OpenAI engine.
import { OpenAiEngine } from '@robojs/ai/engines/openai'
export default {
engine: new OpenAiEngine({
webSearch: true
})
}
When active, responses embed inline markers like [1] and append a Sources: footer with clickable links.
The latest LTS release of Node.js is 20.x [1], which introduces performance
improvements to the web streams API [2].
Sources:
[1] Node.js β https://nodejs.org
[2] Release Notes β https://nodejs.org/en/blog/release
π‘ Citations are injected via
injectCitationMarkers()and rendered withformatSourcesLine()βno extra formatting needed on your end.
Featuresβ
- π¬ Natural Discord conversations with mentions, replies, DMs, and channel whitelists.
- ποΈ Native voice conversations with realtime transcription and playback.
- ποΈ Vision-aware chats that understand images and URLs.
- π Optional web search with automatic citations.
- π οΈ Natural language command execution mapped to your slash commands.
- π Token usage tracking with configurable rate limits.
- π Knowledge sync via vector stores sourced from
/documents. - πΌοΈ Image generation for creative prompts.
- π Extensible engine API to plug in other model providers.
- π REST endpoint (
/api/ai/chat) for web integrations (requires@robojs/server).
Configurationβ
Basic Plugin Optionsβ
PluginOptions live in config/plugins/robojs/ai.(mjs|ts):
instructionsβ System prompt or persona description.commandsβtrue,false, or string array allow list for natural language command execution.insightβ Enable/documentssync for vector search (defaulttrue).restrictβ Limit responses to specific channel IDs.whitelistβ Allow mention-free chat in selected channels.engineβ Instance of anyBaseEnginesubclass.voiceβ Voice configuration overrides.usageβ Token ledger settings (limits, alerts, hooks).
import { OpenAiEngine } from '@robojs/ai/engines/openai'
export default {
instructions: 'You are Sage, a friendly community mentor who answers succinctly.',
commands: ['ban', 'kick', 'ai log'],
insight: true,
restrict: { channelIds: ['123'] },
whitelist: { channelIds: ['456', '789'] },
usage: {
limits: [
{
window: 'day',
mode: 'alert',
maxTokens: 200_000
}
]
},
engine: new OpenAiEngine({
chat: {
model: 'gpt-4.1-mini',
temperature: 0.6,
maxOutputTokens: 800
}
})
}
[!NOTE] All engines follow the same
BaseEnginecontract, so swapping providers is as simple as returning a different engine instance.
Engine Configurationβ
The default OpenAiEngine accepts these options:
clientOptionsβ Pass through to the OpenAI SDK (apiKey,baseURL, fetch overrides).chatβ Default chat model config (model,temperature,maxOutputTokens,reasoningEffort).voiceβ Defaults for realtime or TTS models plus transcription settings.webSearchβ Enable/disable web search tool.
new OpenAiEngine({
clientOptions: {
apiKey: process.env.OPENAI_API_KEY,
organization: process.env.OPENAI_ORG_ID
},
chat: {
model: 'gpt-5-preview',
reasoningEffort: 'medium',
temperature: 0.2,
maxOutputTokens: 1200
},
voice: {
model: 'gpt-4o-realtime-preview',
transcription: {
model: 'gpt-4o-transcribe-latest',
language: 'en'
}
},
webSearch: true
})
π‘ Use reasoning models (
o1,o3,gpt-5) for complex planning; standard models (gpt-4o,gpt-4.1-mini) keep responses fast and vision-ready.
[!NOTE] Image generation is available via
AI.generateImage(options); engine-level defaults for images are not currently configurable through the constructor.
οΈ Voice Featuresβ
Voice support is built-inβno extra plugin required.
enabledβ Toggle voice globally (defaults totrue).endpointingβ Choose'server-vad'(automatic voice activity detection) or'manual'(explicit end-of-turn control).modelβ Override the realtime model.playbackVoiceβ Select the TTS voice ID.maxConcurrentChannelsβ Limit simultaneous guild sessions.instructionsβ Voice-specific system prompt.captureβ Audio capture config (channels,sampleRate,silenceDurationMs,vadThreshold).playbackβ Output sampling configuration.transcriptβ Embed transcripts (guild channel ID, enable flag).perGuildβ Override any of the above per guild.
export default {
voice: {
endpointing: 'server-vad',
maxConcurrentChannels: 2,
playbackVoice: 'alloy',
instructions: 'Keep spoken replies upbeat and under 8 seconds.',
capture: {
sampleRate: 48000,
silenceDurationMs: 600,
vadThreshold: 0.35
},
transcript: {
enabled: true,
targetChannelId: '987654321098765432'
},
perGuild: {
'123456789012345678': {
playbackVoice: 'verse'
}
}
}
}
Voice lifecycle events (session:start, session:stop, config:change, transcript:segment) are available via AI.onVoiceEvent. Transcript embeds require CONNECT, SPEAK, and SEND_MESSAGES permissions.
[!TIP]
server-vadis recommendedβit automatically trims silence and hands conversations back to the model without manual end markers.
Token Usage & Limitsβ
Token accounting is handled by the tokenLedger singleton.
- Records usage for every AI call (chat, voice, image, custom tools).
- Groups stats by window:
day,week,month,lifetime. - Enforces limits in
blockmode (throws) oralertmode (emits events). - Tracks per-model totals, ideal for budgeting across models/providers.
export default {
usage: {
limits: [
{
window: 'month',
mode: 'block',
maxTokens: 1_000_000,
models: ['gpt-4o', 'gpt-4o-mini']
},
{
window: 'day',
mode: 'alert',
maxTokens: 200_000
}
]
}
}
import { AI, tokenLedger, TokenLimitError } from '@robojs/ai'
try {
await AI.chatSync(
[{ role: 'user', content: 'Summarize the meeting notes.' }],
{}
)
} catch (error) {
if (error instanceof TokenLimitError) {
// Notify admins or throttle usage
}
}
const weekly = await AI.getUsageSummary({ window: 'week' })
const lifetime = tokenLedger.getLifetimeTotals()
Usage events:
tokenLedger.on('usage.recorded', payload => {
console.log('Usage recorded:', payload)
})
tokenLedger.on('usage.limitReached', payload => {
// Alert your team via webhook/DM
})
β οΈ Exceeding a
blockmode limit throwsTokenLimitError; the/ai usagecommand helps server admins monitor usage directly inside Discord.
JavaScript API Referenceβ
AI Singletonβ
AI.chat(messages: ChatMessage[], options: ChatOptions)β Streaming conversation helper acknowledging Discord context.messagesis the array of chat turns (system, user, assistant).optionsextends engine chat settings with Discord context and requiresonReply.AI.chatSync(messages: ChatMessage[], options: Omit<ChatOptions, 'onReply'>)β Promise that resolves with the first assistant reply, suitable for HTTP handlers or scripts.AI.generateImage(options)β Image creation via configured engine.AI.isReady()β Checks engine initialization state.AI.getActiveTasks(channelId?)β Returns active background task snapshots.
Voice Controlsβ
AI.startVoice({ guildId, channelId, textChannelId? })AI.stopVoice({ guildId, channelId? })AI.setVoiceConfig({ patch, guildId? })AI.getVoiceStatus(guildId?)AI.getVoiceMetrics()AI.onVoiceEvent(event, listener)/AI.offVoiceEvent(event, listener)
Usage Trackingβ
AI.getUsageSummary(query)β Aggregated usage per window.AI.getLifetimeUsage(model?)β Lifetime totals.AI.onUsageEvent(event, listener)/AI.onceUsageEvent(event, listener)/AI.offUsageEvent(event, listener).
Token Ledger Direct Accessβ
tokenLedger.recordUsage(entry)tokenLedger.getSummary(query)tokenLedger.getLifetimeTotals(model?)tokenLedger.configure(config)tokenLedger.getLimits()/tokenLedger.setLimits(limits)tokenLedger.getLimitState(model?)
οΈ Command Executionβ
Natural language commands are routed to your registered slash commands via tool calls.
- Permission checks mimic Discord's native rules (roles, DM ban lists).
- Sage deferral kicks in after
deferBuffer(default 3s) to avoid interaction timeouts. - Background jobs show typing indicators and follow-up messages until completion.
// Example snippet from a custom tool handler
AI.chat(messages, {
channel,
member,
onReply: reply => {
if (reply.type === 'command-executed') {
console.log('Command dispatched:', reply.command)
}
}
})
[!IMPORTANT] Keep your slash command permissions in sync. The AI respects them but cannot override Discord's permission model.
Insights (Vector Store)β
- Drop files into
/documentsand Robo syncs them to a vector store on startup. - Supports all file types listed in OpenAI file search docs.
- Hash-based diffing uploads only changed files.
- Disable by setting
insight: falsein the plugin config.
export default {
insight: false
}
π‘ Remind the bot to reference key documents by mentioning them in
instructionsfor higher recall accuracy.
Web APIβ
Pair with @robojs/server to expose REST endpoints.
POST /api/ai/chat
- Request body:
{ messages: ChatMessage[] } - Response body:
{ message: string }
// src/api/ai/chat.ts
import { AI } from '@robojs/ai'
import type { ChatMessage } from '@robojs/ai'
import type { RoboRequest } from '@robojs/server'
export default async function handler(req: RoboRequest) {
const { messages } = await req.json<{ messages: ChatMessage[] }>()
if (!messages?.length) {
return { message: '' }
}
const reply = await AI.chatSync(messages, {})
return { message: reply.text ?? '' }
}
[!NOTE] Install
@robojs/servervianpx robo add @robojs/serverto enable the route.
Custom Engine Developmentβ
Create your own engine by extending the abstract base class.
import { BaseEngine } from '@robojs/ai'
import type {
ChatOptions,
ChatResult,
VoiceSessionHandle,
VoiceSessionStartOptions
} from '@robojs/ai/engines/base'
class MyEngine extends BaseEngine {
async init() {
// Warm up connections, perform auth
}
async chat(options: ChatOptions): Promise<ChatResult> {
// Call your provider and return the assistant response
return {
messages: [
{
role: 'assistant',
content: 'Hello from a custom provider!'
}
]
}
}
supportedFeatures() {
return {
voice: false,
image: true,
webSearch: false
}
}
}
export default new MyEngine()
Mandatory methods: init(), chat(), supportedFeatures(). Optional overrides include generateImage(), startVoiceSession(), stopVoiceSession(), and summarizeToolResult(). Engines emit hooks via engine.on(event, listener) if you want to observe chat flow or tool usage.
[!TIP] The engine lifecycle is production readyβuse it to integrate Azure OpenAI, Anthropic, Llama, or on-prem solutions.
οΈ Troubleshootingβ
- Missing voice audio β Confirm optional voice dependencies are installed and the bot has
CONNECT+SPEAKpermissions. - Bot ignores channels β Check
restrictandwhitelistconfiguration;restricttakes precedence. - TokenLimitError thrown β You exceeded a
blockmode limit. Lower usage or switch toalert. - High latency β Large insight datasets can slow responses; prune
/documentsor increase engine model speed (e.g.,gpt-4.1-mini). - Voice quality issues β Tune
capture.sampleRateandendpointingstrategy.server-vadsmooths out long silences. - Transcript embeds missing β Ensure the transcript channel exists and the bot can send messages there.
[!HELP] Still stuck? Enable debug logging via
ROBO_LOG_LEVEL=debugand watch for detailed engine traces.
Got questions?β
If you need help, hop into our community Discord. We love to chat, and Sage (our resident AI Robo) is always online to assist.
β π Community: Join our Discord server