Jacky AI Agent
This project is publicly accessible. Click to access.
Published:
22 minute read
In One Sentence
Jacky AI Agent (Jabot) is a personal AI assistant I designed, built, and deployed end-to-end — centred on persistent knowledge (global cloud drive + RAG notes), deep exploration (daily seeds & note sprouting), grounded, tool-backed answers (cited web search, video parsing, maps, documents, code sandbox, image & animation generation), and life management (memos, timed reminders, unified push notifications).
Try it: jacky-info.com/jackybot/
The Problem
Most AI chat products treat every conversation as an island. Files disappear inside old threads, notes never feed back into answers, and the model freely invents facts with no sources. Video links on Bilibili or YouTube are often unreadable. Location, weather, and maps require switching apps. Document work, code execution, and media creation live in separate tools. Reminders and memos, if they exist at all, do not push to your device when they matter.
Jabot addresses these gaps as one integrated system — not a chat box with bolt-on features, but a personal knowledge and action layer where uploads, AI outputs, notes, citations, and scheduled notifications all connect.
Ten Core Capabilities
These are the main design pillars — each solves a distinct pain point.
1. Global Cloud Drive — Upload Once, Reference Everywhere
Pain point: In typical AI chats, files are trapped inside a single conversation. You re-upload the same PDF three times, or scroll through dozens of old threads to find something the AI generated last week.
What Jabot does:
- One global cloud drive shared across all conversations — not per-chat attachments.
- Two logical zones: user uploads (
用户上传的文件) and AI-generated files (AI生成的文件) — Word exports, sandbox outputs, generated images, rendered MP4s, etc. all land in the agent zone automatically. - Upload once, reference in any chat: pick files, folders, or entire directory trees from the reference picker; attach as chips or folder references without re-uploading.
- Folder upload & batch drag-and-drop with progress tracking; in-browser preview for PDF, Office, images, and 3D models (GLTF/OBJ/STL/PLY).
- Storage quotas: 2 GiB (free) / 20 GiB (Pro); separate accounting for uploads vs agent-generated storage.
- Sandbox integration: load cloud files into an isolated execution environment, run transformations, then sync results back to the cloud drive — the output persists even after the chat ends.
The cloud drive is the single source of truth for everything you and the AI produce. You never hunt through scattered conversations for a file again.
2. Notes as a Growing RAG Knowledge Base
Pain point: Saving snippets in chat history does not make the AI smarter over time. Keyword search misses meaning. Knowledge stays siloed.
What Jabot does:
- Save anything into notes: chat replies, web pages, video summaries, voice captures, manual entries — from the Knowledge Base page or via the
notes_savetool during conversation. - Automatic metadata: LLM generates title (≤20 chars) and 3–4 tags on save if you do not provide them.
- Folder organisation, sharing (public / private links), swipe-style reading, full source text preservation.
- Full RAG pipeline:
- On save → content written to MongoDB → async embedding via Voyage AI (
voyage-4-lite, 1024-dim) → vector stored in Qdrant with per-user isolation. - On query →
notes_searchruns semantic search (cosine similarity + relevance gate) or keyword / list modes. - Agent retrieves relevant notes and injects them into context — answers grounded in your library, not generic training data.
- On save → content written to MongoDB → async embedding via Voyage AI (
- Compounding personalisation: the more you add, the richer retrieval becomes. Factual questions trigger parallel
notes_search+web_search_jackyso personal knowledge and fresh web data combine in one answer. - Citation markers
[^N]on claims drawn from notes or web — clickable source list at the bottom of every reply.
Your notes are not a dead archive; they are a living knowledge graph that shapes every future answer.
3. Daily Seeds & Note Sprouting — Deep Exploration Engine
Pain point: You save notes but rarely revisit them. Generic AI prompts ignore your interests and today's world. Insights stay shallow.
What Jabot does:
Today's Seeds (每日种子)
- Each day, the system generates a personalised seed pool (default ~12 seeds) anchored to today's real-world news and trends.
- Generation pipeline:
- Parallel web searches — global news, tech trends, China hot topics, English headlines, plus queries derived from your interest tags.
- LLM synthesises candidate seeds using your user profile summary and interests — multi-domain, vivid, date-specific.
- Seeds cached per user per day; nightly job refreshes pools for active users.
- Each seed has
text,topic, andsearchHint. Tap a seed → AI summarises with cited sources → optionally sprout into a full exploration report. - Sprouted seeds are tracked so you can see what you have already explored.
Note Sprouting (笔记发芽)
- Pick one note, today's notes, or a daily seed as input.
- Two-stage engine:
- Extract key themes → keyword + semantic search across your note library → 1–2 web searches for cross-domain context.
- Deep LLM analysis produces structured output: seed (core insight kernel), aha moments (cross-domain surprises), topics, explorations (question + insight pairs), search keywords, and a summary report.
- Results can be saved directly back to notes — knowledge compounds again.
- Usage limits: 1 sprout/day (free), 3/day (Pro).
Together, seeds pull you toward today's hotspots aligned with your profile; sprouting pushes you deeper into what you already know — turning storage into discovery.
4. Bilibili & YouTube — Full Content Parsing
Pain point: Pasting a video link into most AI tools yields "I cannot access this video." You still need to watch or manually transcribe.
What Jabot does:
- Dedicated Content Hub / VideoHub pipeline for Bilibili and YouTube URLs.
- Transcript extraction:
- YouTube — Downsub API + yt-dlp fallback.
- Bilibili — yt-dlp CC subtitles; URL short-link resolution.
- No subtitles? Downloads audio and runs Deepgram speech recognition.
- Structured output card with three tabs:
- Source — full transcript with timestamps (video) or scraped page text (web links).
- Summary — streaming AI synopsis in your language.
- Mind map — interactive Markmap visualisation of the content structure.
- One-click save to notes — transcript, summary, and mind map persist into your RAG library for future retrieval.
- Also handles generic web links via the same hub (scraping + summarise + mind map).
Video and link content becomes first-class knowledge — parsed, summarised, visualised, and storable in seconds.
5. Global Web Search with Source Citations
Pain point: AI answers sound confident but cite nothing. Hallucinated facts are indistinguishable from real ones.
What Jabot does:
- Multi-provider search with automatic failover: Search1API (incl. Bilibili/Zhihu-targeted search) → Brave → SerpAPI → SearchAPI.io. Multiple API keys load-balance; exhausted keys switch automatically.
- Web scraping for page-level detail — CloakBrowser microservice (58 anti-detection patches, JS rendering, in-page image OCR) with httpx + Readability fallback.
- Mandatory citation discipline: whenever
web_search_jacky,web_scraping, ornotes_searchruns, the agent must attach[^N]markers inline; the frontend renders a collapsible 参考来源 footer with clickable links (external URLs) or note references (open in Knowledge Base). - Citations are auto-built from tool results — not LLM-invented URLs.
- Domain blocklist support for filtering unwanted sources.
Search is not a black box — every factual claim can be traced to a source.
6. Real-Time Location, Maps & Weather
Pain point: "What's the weather near me?" or "Find a café on my route" requires leaving the chat and opening separate map apps.
What Jabot does:
- Browser geolocation (
current_location) — latitude, longitude, accuracy injected into agent context when permitted. - Dual map providers: Google Maps (international) and Amap 高德 (China) — user preference persisted via
set_map_preference. - Navigation skill: geocoding, place search, turn-by-turn directions, live traffic, place details (hours, rating, phone, photos).
- Weather skill: forecast by location, air quality index.
- Context injection: timezone, map preference, and location flow into
jacky_contexton every conversation — the agent knows where and when you are asking from. - Local POI recommendations combine web discovery with map-verified place details — not unverified search snippets.
Location-aware answers without app-switching.
7. Full Document Editing & Academic Assistant
Pain point: AI can chat about documents but cannot reliably produce or edit Office files, PDFs, or formatted citations.
What Jabot does:
- Dedicated skills for each format, executed inside the per-user sandbox:
- Word (
docx) — create, edit, convert documents. - PowerPoint (
pptx) — create, edit, slide thumbnails. - Excel (
xlsx) — create, edit, formulas. - PDF (
pdf) — 9 operation types: merge, split, form fill, convert, and more.
- Word (
- Workflow: agent writes/edits in sandbox →
sandbox_sync_to_cloud→ file appears inAI生成的文件→ preview in browser or download. file_readreads cloud files back into context for iterative editing across sessions.- Academic research skill:
academic_paper_search— find papers across academic databases.format_citation— APA, MLA, Chicago, and other citation styles.
- Mind map skill — Markdown mind maps + PNG export for any topic or parsed content.
From draft to formatted deliverable — stored in your cloud drive, not lost in chat.
8. Sandbox & Code Execution
Pain point: You need to run a script, transform data, or batch-process files — but chat-only AI cannot execute anything.
What Jabot does:
- Per-user isolated shell environment on the server.
- Tools:
sandbox_run_command,sandbox_write_file,sandbox_read_file,sandbox_patch_file,sandbox_list_dir. - Cloud bridge:
sandbox_load_from_cloudpulls your uploads or agent files in;sandbox_sync_to_cloudpushes outputs back. - Configurable timeout (default 60 s, max 600 s).
- Used by document skills, animation rendering, data processing, and any task the agent deems needs code.
The agent does not just suggest commands — it runs them safely and persists results.
9. Image Generation, Editing & HTML→MP4 Animation
Pain point: Image and video creation require separate subscriptions and manual file management.
What Jabot does:
Image Generation & Editing
image_generate— text-to-image via poyo.ai gpt-image-2; configurable quality and resolution.image_edit— modify existing images (inpainting, style change, background removal, etc.) via gpt-image-2-edit.- Input from cloud paths or attachment IDs — works on both user uploads and prior AI-generated images under
AI图片生成/. - Background task + Web Push notification when ready — no need to keep the tab open.
- Output saved to
AI生成的文件/AI图片— immediately referenceable in future chats.
HTML Animation → MP4
animation_demoskill — author HTML + GSAP animations following HyperFrames conventions (1920×1080 compositions, timeline registry).- Agent writes HTML in sandbox → syncs to cloud →
animation_render_videorenders MP4 server-side (Node.js HyperFrames + ffmpeg). - Official motion-design doc library (
animation_read_doc) for styles, typography, transitions, and component catalog. - Finished MP4 lands in cloud drive — shareable and previewable like any other file.
Creative output is treated like any other asset — generated, stored, and reusable globally.
10. Memos, Timed Reminders & Push Notifications
Pain point: AI chats forget what you asked them to remind you of. Phone alarm apps do not understand natural language. Task lists and timed alerts live in different apps, and web-based AI tools rarely notify you when something is due — especially when the tab is closed.
What Jabot does:
Two distinct modes — memos vs reminders
Jabot separates what to remember from when to notify:
| Type | Tool | Use case |
|---|---|---|
| Memo (备忘) | todo_manager | Quick capture, checklists, tasks without a specific trigger time — e.g. "buy milk", "follow up with client" |
| Timed reminder (提醒) | reminder_manager | Alerts that must fire at a concrete date/time — e.g. "remind me at 9am every weekday", "in 30 minutes" |
The agent is instructed not to mix them: timed notifications always go through reminder_manager; memo-style task tracking always goes through todo_manager.
Reminder capabilities
- One-time reminders: specify
firstRunDate(today/tomorrow/YYYY-MM-DD), orrelativeMinutes("in 30 minutes"). - Recurring reminders: iCalendar RRULE rules — daily, weekly (e.g. Mon/Wed/Fri), monthly (e.g. 1st of month), with custom end dates.
- Timezone-aware: every schedule carries an IANA timezone (
Asia/Shanghai, etc.); the backend computesdtstart— no manual timestamp math. - Holiday exclusion: pass
exdatesMsto skip public holidays (current year + next year); backend normalises to midnight in the user's timezone. - Natural-language creation: home-screen quick action or
/ai/home/quick-reminderAPI — LLM parses "every Monday at 10am" into structured RRULE + time + timezone. - Agent-managed: during chat, the scheduling skill creates, updates, lists, and deletes reminders on your behalf.
Memo (todo) capabilities
- Create, edit, complete, and delete memos — supports multi-line batch input (one line = one memo).
- Optional due date and priority (
low/medium/high). - Task Centre UI with dedicated views:
- Reminder tab — calendar layout showing upcoming timed alerts; click to view/edit/delete.
- Memo tab — pending and completed lists; inline edit and mark-done.
- Weekly / monthly stats — reminder completion rate, memo totals, completion percentage.
- Quick capture from home screen ("添加备忘") or chat ("添加备忘:…") without starting a full conversation.
Background schedule engine
- Persistent storage in MongoDB (
schedule_itemscollection). - Background ScheduleService continuously computes next run times via RRULE, handles misfires (6-hour grace window), retries with exponential backoff, and cleans up stale jobs.
- When a reminder fires, the payload (title + message) is delivered through the unified push layer.
Unified push — Web Push + native FCM
Notifications are not an afterthought — they are a first-class delivery channel:
-
Browser Web Push (VAPID):
- User enables notifications in the User Centre → subscription stored in MongoDB.
- Service Worker receives push even when the PWA tab is closed.
- Test notification button to verify the pipeline end-to-end.
- Stale subscriptions (404/410) auto-disabled; delivery errors tracked per endpoint.
-
Native app push (Firebase FCM):
- Flutter WebView shell registers FCM tokens via
JabotNativebridge. - Same unified
send_user_push_notificationAPI delivers to both Web Push and FCM in one call.
- Flutter WebView shell registers FCM tokens via
-
Deep links: tapping a reminder notification opens
/jackybot/super-agent?reminderId=…— straight to the reminder detail, including occurrence timestamp for recurring items. -
Background task notifications: long-running jobs (image generation, image editing, animation render) also push when complete — "your image is ready in cloud drive" — so you do not need to keep the tab open.
Reminders and memos turn Jabot from a session-based chat into something that follows you through the day — capture in natural language, schedule with timezone precision, get notified on web or mobile.
Supporting Capabilities
Beyond the nine pillars, Jabot also includes:
Conversation & Agent Loop
- Multi-turn streaming chat with visible thinking steps and tool progress (SSE).
- Up to 20 tool-call iterations per request; lazy skill activation to control token cost.
- Conversation history with automatic compression, session title generation, and full-text search.
- Voice input — Chinese, English, auto-detect (upload or real-time WebSocket via Deepgram Nova-3).
- Model selection per session (Claude, GPT, DeepSeek, Gemini, Groq, Zhipu, self-hosted vLLM, etc.).
- Long-term user memory — preferences and facts persist across sessions; nightly consolidation job.
Cross-Platform Access
- Web: Vue 3 PWA — installable on desktop and mobile home screen; dark / light theme.
- Mobile: Flutter WebView shell with native bridge (haptics, secure storage, push notifications).
- Service Worker offline cache.
Security & Quotas
- JWT authentication (180-day tokens); brute-force protection; optional hCaptcha; beta whitelist mode.
- Rate limiting (40–120 chat requests/min by tier); monthly token quotas (2M free / 20M Pro).
- Session concurrency limits, upload size limits (30 MB per file), storage quotas.
Agent Skill System (16+ Bundled Skills)
Skills are tool bundles the AI activates on demand. Only core and communication load by default; the agent calls use_skill (or you type /skill-id) to load others.
| Skill | Default | Tools | What it enables |
|---|---|---|---|
core | ✅ | use_skill, manage_user_memory, current_time | Skill activation, long-term memory, timezone-aware time |
communication | ✅ | terminate | Gracefully end the agent loop when done |
user_location | — | current_location | Browser geolocation |
weather | — | geocoding, weather_api, air_quality | Forecasts and air quality |
web | — | web_search_jacky, web_scraping, video_content_extract | Search, page scrape, video transcripts |
navigation | — | geocoding, directions, place_details, route_traffic, set_map_preference | Maps, routes, places, traffic |
notes | — | notes_save, notes_update, notes_search, notes_list_folders | Save, edit, semantic search notes |
scheduling | — | reminder_manager, todo_manager | Timed reminders (RRULE) and memo todos |
sandbox | — | sandbox_run_command, sandbox_write_file, sandbox_read_file, sandbox_patch_file, sandbox_list_dir, sandbox_load_from_cloud, sandbox_sync_to_cloud | Code execution and file ops |
files | — | file_read, list_dir_cloud, resource_download | Cloud disk access |
media | — | pexels_search | Stock image search |
mindmap | — | generate_mindmap | Mind map generation |
docx / pptx / xlsx / pdf | — | (via sandbox) | Office & PDF create / edit / convert |
academic_research | — | academic_paper_search, format_citation | Paper search and citations |
image_generate / image_edit | — | background image tasks | AI image creation and editing |
animation_demo | — | animation_read_doc, animation_render_video | HTML animation → MP4 |
Real-World Impact
My most complete personal project — it combines ideas from KnowFlow (knowledge capture), Easy Life Agent (life management), and enterprise patterns from Project Dashboard / Product Scanner into one production system deployed at jacky-info.com/jackybot/.
The design goal is not "another chatbot" but a daily-use personal intelligence layer: everything you learn, create, and ask stays connected — searchable, citable, and reusable tomorrow.
Skills Demonstrated
| Area | What this project shows |
|---|---|
| AI product design | Multi-pillar UX (cloud + RAG + exploration + grounded search), skill system, memory |
| Full-stack development | Python/FastAPI + Vue 3 + Flutter |
| RAG & vector search | Voyage AI embedding, Qdrant semantic retrieval, relevance filtering |
| Media & content pipelines | Video transcript, mind map, animation render, image generation |
| Cloud & DevOps | Auth, quotas, rate limits, volume storage, Web Push (VAPID) + FCM |
| Integration | LLM, search, speech, maps, scraping, academic APIs, HyperFrames |
How It Works (Plain English)
- You interact via web or mobile — chat, quick tools, or resource hubs (cloud / notes / seeds).
- The FastAPI backend authenticates, checks quotas, and starts an agent loop.
- The agent selects skills, calls tools (search notes, scrape a video, run sandbox code, etc.), and streams results via SSE.
- Anything worth keeping — notes, files, mind maps, generated documents — lands in your global cloud or RAG notes; timed reminders and memos persist in the Task Centre and notify via push when due.
- Next time you ask, retrieval pulls your history + fresh web data + location context — answers get more personal and better sourced over time.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ User Devices │
│ Web Browser (PWA) · iOS/Android (Flutter shell) │
└──────────────────────────────┬──────────────────────────────┘
│ HTTPS · SSE · WebSocket
┌──────────────────────────────▼──────────────────────────────┐
│ AI Backend (Python / FastAPI) │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ Auth & │ │ Agent loop │ │ 16+ bundled skills │ │
│ │ rate limits │→ │ (≤20 iter) │→ │ (lazy activation) │ │
│ └─────────────┘ └──────────────┘ └─────────────────────┘ │
└───────┬───────────────────────────────┬─────────────────────┘
│ │
┌────▼─────┐ ┌────▼─────────────────────┐
│ MongoDB │ │ External services │
│ users · │ │ LLM · Deepgram · search │
│ chats · │ │ maps · Voyage AI · poyo │
│ notes · │ │ HyperFrames · downsub │
│ files │ └──────────────────────────┘
└────┬─────┘
┌────▼─────┐ ┌──────────────────┐
│ Qdrant │ │ CloakBrowser │
│ (RAG) │ │ (JS web scraping) │
└──────────┘ └──────────────────┘
Tech Stack
Backend (Nanobot)
- Python 3.11, FastAPI, LiteLLM, MongoDB, Qdrant, Railway Volume file storage
- Deepgram Nova-3 STT, Voyage AI embeddings, Web Push (VAPID), HyperFrames + ffmpeg
Frontend
- Vue 3, Vite 7, PWA, Marked + KaTeX + Markmap, jit-viewer, Amap/Google Maps
Mobile
- Flutter WebView +
window.JabotNativebridge, Firebase push
LLM Providers (via LiteLLM)
- Anthropic, OpenAI, OpenRouter, DeepSeek, Groq, Zhipu, Gemini, self-hosted vLLM
One assistant where your knowledge, files, and tools stay connected — built to be useful every day.

