Jacky AI Agent

This project is publicly accessible. Click to access.

Published: 7 June 2026

22 minute read

In One Sentence

Jacky AI Agent (Jabot) is a personal AI assistant I designed, built, and deployed end-to-end — centred on persistent knowledge (global cloud drive + RAG notes), deep exploration (daily seeds & note sprouting), grounded, tool-backed answers (cited web search, video parsing, maps, documents, code sandbox, image & animation generation), and life management (memos, timed reminders, unified push notifications).

Try it: jacky-info.com/jackybot/

The Problem

Most AI chat products treat every conversation as an island. Files disappear inside old threads, notes never feed back into answers, and the model freely invents facts with no sources. Video links on Bilibili or YouTube are often unreadable. Location, weather, and maps require switching apps. Document work, code execution, and media creation live in separate tools. Reminders and memos, if they exist at all, do not push to your device when they matter.

Jabot addresses these gaps as one integrated system — not a chat box with bolt-on features, but a personal knowledge and action layer where uploads, AI outputs, notes, citations, and scheduled notifications all connect.

Ten Core Capabilities

These are the main design pillars — each solves a distinct pain point.

1. Global Cloud Drive — Upload Once, Reference Everywhere

Pain point: In typical AI chats, files are trapped inside a single conversation. You re-upload the same PDF three times, or scroll through dozens of old threads to find something the AI generated last week.

What Jabot does:

One global cloud drive shared across all conversations — not per-chat attachments.
Two logical zones: user uploads (用户上传的文件) and AI-generated files (AI生成的文件) — Word exports, sandbox outputs, generated images, rendered MP4s, etc. all land in the agent zone automatically.
Upload once, reference in any chat: pick files, folders, or entire directory trees from the reference picker; attach as chips or folder references without re-uploading.
Folder upload & batch drag-and-drop with progress tracking; in-browser preview for PDF, Office, images, and 3D models (GLTF/OBJ/STL/PLY).
Storage quotas: 2 GiB (free) / 20 GiB (Pro); separate accounting for uploads vs agent-generated storage.
Sandbox integration: load cloud files into an isolated execution environment, run transformations, then sync results back to the cloud drive — the output persists even after the chat ends.

The cloud drive is the single source of truth for everything you and the AI produce. You never hunt through scattered conversations for a file again.

2. Notes as a Growing RAG Knowledge Base

Pain point: Saving snippets in chat history does not make the AI smarter over time. Keyword search misses meaning. Knowledge stays siloed.

What Jabot does:

Save anything into notes: chat replies, web pages, video summaries, voice captures, manual entries — from the Knowledge Base page or via the notes_save tool during conversation.
Automatic metadata: LLM generates title (≤20 chars) and 3–4 tags on save if you do not provide them.
Folder organisation, sharing (public / private links), swipe-style reading, full source text preservation.
Full RAG pipeline:
- On save → content written to MongoDB → async embedding via Voyage AI (voyage-4-lite, 1024-dim) → vector stored in Qdrant with per-user isolation.
- On query → notes_search runs semantic search (cosine similarity + relevance gate) or keyword / list modes.
- Agent retrieves relevant notes and injects them into context — answers grounded in your library, not generic training data.
Compounding personalisation: the more you add, the richer retrieval becomes. Factual questions trigger parallel notes_search + web_search_jacky so personal knowledge and fresh web data combine in one answer.
Citation markers [^N] on claims drawn from notes or web — clickable source list at the bottom of every reply.

Your notes are not a dead archive; they are a living knowledge graph that shapes every future answer.

3. Daily Seeds & Note Sprouting — Deep Exploration Engine

Pain point: You save notes but rarely revisit them. Generic AI prompts ignore your interests and today's world. Insights stay shallow.

What Jabot does:

Today's Seeds (每日种子)

Each day, the system generates a personalised seed pool (default ~12 seeds) anchored to today's real-world news and trends.
Generation pipeline:
1. Parallel web searches — global news, tech trends, China hot topics, English headlines, plus queries derived from your interest tags.
2. LLM synthesises candidate seeds using your user profile summary and interests — multi-domain, vivid, date-specific.
3. Seeds cached per user per day; nightly job refreshes pools for active users.
Each seed has text, topic, and searchHint. Tap a seed → AI summarises with cited sources → optionally sprout into a full exploration report.
Sprouted seeds are tracked so you can see what you have already explored.

Note Sprouting (笔记发芽)

Pick one note, today's notes, or a daily seed as input.
Two-stage engine:
1. Extract key themes → keyword + semantic search across your note library → 1–2 web searches for cross-domain context.
2. Deep LLM analysis produces structured output: seed (core insight kernel), aha moments (cross-domain surprises), topics, explorations (question + insight pairs), search keywords, and a summary report.
Results can be saved directly back to notes — knowledge compounds again.
Usage limits: 1 sprout/day (free), 3/day (Pro).

Together, seeds pull you toward today's hotspots aligned with your profile; sprouting pushes you deeper into what you already know — turning storage into discovery.

4. Bilibili & YouTube — Full Content Parsing

Pain point: Pasting a video link into most AI tools yields "I cannot access this video." You still need to watch or manually transcribe.

What Jabot does:

Dedicated Content Hub / VideoHub pipeline for Bilibili and YouTube URLs.
Transcript extraction:
- YouTube — Downsub API + yt-dlp fallback.
- Bilibili — yt-dlp CC subtitles; URL short-link resolution.
- No subtitles? Downloads audio and runs Deepgram speech recognition.
Structured output card with three tabs:
- Source — full transcript with timestamps (video) or scraped page text (web links).
- Summary — streaming AI synopsis in your language.
- Mind map — interactive Markmap visualisation of the content structure.
One-click save to notes — transcript, summary, and mind map persist into your RAG library for future retrieval.
Also handles generic web links via the same hub (scraping + summarise + mind map).

Video and link content becomes first-class knowledge — parsed, summarised, visualised, and storable in seconds.

5. Global Web Search with Source Citations

Pain point: AI answers sound confident but cite nothing. Hallucinated facts are indistinguishable from real ones.

What Jabot does:

Multi-provider search with automatic failover: Search1API (incl. Bilibili/Zhihu-targeted search) → Brave → SerpAPI → SearchAPI.io. Multiple API keys load-balance; exhausted keys switch automatically.
Web scraping for page-level detail — CloakBrowser microservice (58 anti-detection patches, JS rendering, in-page image OCR) with httpx + Readability fallback.
Mandatory citation discipline: whenever web_search_jacky, web_scraping, or notes_search runs, the agent must attach [^N] markers inline; the frontend renders a collapsible 参考来源 footer with clickable links (external URLs) or note references (open in Knowledge Base).
Citations are auto-built from tool results — not LLM-invented URLs.
Domain blocklist support for filtering unwanted sources.

Search is not a black box — every factual claim can be traced to a source.

6. Real-Time Location, Maps & Weather

Pain point: "What's the weather near me?" or "Find a café on my route" requires leaving the chat and opening separate map apps.

What Jabot does:

Browser geolocation (current_location) — latitude, longitude, accuracy injected into agent context when permitted.
Dual map providers: Google Maps (international) and Amap 高德 (China) — user preference persisted via set_map_preference.
Navigation skill: geocoding, place search, turn-by-turn directions, live traffic, place details (hours, rating, phone, photos).
Weather skill: forecast by location, air quality index.
Context injection: timezone, map preference, and location flow into jacky_context on every conversation — the agent knows where and when you are asking from.
Local POI recommendations combine web discovery with map-verified place details — not unverified search snippets.

Location-aware answers without app-switching.

7. Full Document Editing & Academic Assistant

Pain point: AI can chat about documents but cannot reliably produce or edit Office files, PDFs, or formatted citations.

What Jabot does:

Dedicated skills for each format, executed inside the per-user sandbox:
- Word (docx) — create, edit, convert documents.
- PowerPoint (pptx) — create, edit, slide thumbnails.
- Excel (xlsx) — create, edit, formulas.
- PDF (pdf) — 9 operation types: merge, split, form fill, convert, and more.
Workflow: agent writes/edits in sandbox → sandbox_sync_to_cloud → file appears in AI生成的文件 → preview in browser or download.
file_read reads cloud files back into context for iterative editing across sessions.
Academic research skill:
- academic_paper_search — find papers across academic databases.
- format_citation — APA, MLA, Chicago, and other citation styles.
Mind map skill — Markdown mind maps + PNG export for any topic or parsed content.

From draft to formatted deliverable — stored in your cloud drive, not lost in chat.

8. Sandbox & Code Execution

Pain point: You need to run a script, transform data, or batch-process files — but chat-only AI cannot execute anything.

What Jabot does:

Per-user isolated shell environment on the server.
Tools: sandbox_run_command, sandbox_write_file, sandbox_read_file, sandbox_patch_file, sandbox_list_dir.
Cloud bridge: sandbox_load_from_cloud pulls your uploads or agent files in; sandbox_sync_to_cloud pushes outputs back.
Configurable timeout (default 60 s, max 600 s).
Used by document skills, animation rendering, data processing, and any task the agent deems needs code.

The agent does not just suggest commands — it runs them safely and persists results.

9. Image Generation, Editing & HTML→MP4 Animation

Pain point: Image and video creation require separate subscriptions and manual file management.

What Jabot does:

Image Generation & Editing

image_generate — text-to-image via poyo.ai gpt-image-2; configurable quality and resolution.
image_edit — modify existing images (inpainting, style change, background removal, etc.) via gpt-image-2-edit.
Input from cloud paths or attachment IDs — works on both user uploads and prior AI-generated images under AI图片生成/.
Background task + Web Push notification when ready — no need to keep the tab open.
Output saved to AI生成的文件/AI图片 — immediately referenceable in future chats.

HTML Animation → MP4

animation_demo skill — author HTML + GSAP animations following HyperFrames conventions (1920×1080 compositions, timeline registry).
Agent writes HTML in sandbox → syncs to cloud → animation_render_video renders MP4 server-side (Node.js HyperFrames + ffmpeg).
Official motion-design doc library (animation_read_doc) for styles, typography, transitions, and component catalog.
Finished MP4 lands in cloud drive — shareable and previewable like any other file.

Creative output is treated like any other asset — generated, stored, and reusable globally.

10. Memos, Timed Reminders & Push Notifications

Pain point: AI chats forget what you asked them to remind you of. Phone alarm apps do not understand natural language. Task lists and timed alerts live in different apps, and web-based AI tools rarely notify you when something is due — especially when the tab is closed.

What Jabot does:

Two distinct modes — memos vs reminders

Jabot separates what to remember from when to notify:

Type	Tool	Use case
Memo (备忘)	`todo_manager`	Quick capture, checklists, tasks without a specific trigger time — e.g. "buy milk", "follow up with client"
Timed reminder (提醒)	`reminder_manager`	Alerts that must fire at a concrete date/time — e.g. "remind me at 9am every weekday", "in 30 minutes"

The agent is instructed not to mix them: timed notifications always go through reminder_manager; memo-style task tracking always goes through todo_manager.

Reminder capabilities

One-time reminders: specify firstRunDate (today / tomorrow / YYYY-MM-DD), or relativeMinutes ("in 30 minutes").
Recurring reminders: iCalendar RRULE rules — daily, weekly (e.g. Mon/Wed/Fri), monthly (e.g. 1st of month), with custom end dates.
Timezone-aware: every schedule carries an IANA timezone (Asia/Shanghai, etc.); the backend computes dtstart — no manual timestamp math.
Holiday exclusion: pass exdatesMs to skip public holidays (current year + next year); backend normalises to midnight in the user's timezone.
Natural-language creation: home-screen quick action or /ai/home/quick-reminder API — LLM parses "every Monday at 10am" into structured RRULE + time + timezone.
Agent-managed: during chat, the scheduling skill creates, updates, lists, and deletes reminders on your behalf.

Memo (todo) capabilities

Create, edit, complete, and delete memos — supports multi-line batch input (one line = one memo).
Optional due date and priority (low / medium / high).
Task Centre UI with dedicated views:
- Reminder tab — calendar layout showing upcoming timed alerts; click to view/edit/delete.
- Memo tab — pending and completed lists; inline edit and mark-done.
- Weekly / monthly stats — reminder completion rate, memo totals, completion percentage.
Quick capture from home screen ("添加备忘") or chat ("添加备忘：…") without starting a full conversation.

Background schedule engine

Persistent storage in MongoDB (schedule_items collection).
Background ScheduleService continuously computes next run times via RRULE, handles misfires (6-hour grace window), retries with exponential backoff, and cleans up stale jobs.
When a reminder fires, the payload (title + message) is delivered through the unified push layer.

Unified push — Web Push + native FCM

Notifications are not an afterthought — they are a first-class delivery channel:

Browser Web Push (VAPID):
- User enables notifications in the User Centre → subscription stored in MongoDB.
- Service Worker receives push even when the PWA tab is closed.
- Test notification button to verify the pipeline end-to-end.
- Stale subscriptions (404/410) auto-disabled; delivery errors tracked per endpoint.
Native app push (Firebase FCM):
- Flutter WebView shell registers FCM tokens via JabotNative bridge.
- Same unified send_user_push_notification API delivers to both Web Push and FCM in one call.
Deep links: tapping a reminder notification opens /jackybot/super-agent?reminderId=… — straight to the reminder detail, including occurrence timestamp for recurring items.
Background task notifications: long-running jobs (image generation, image editing, animation render) also push when complete — "your image is ready in cloud drive" — so you do not need to keep the tab open.

Reminders and memos turn Jabot from a session-based chat into something that follows you through the day — capture in natural language, schedule with timezone precision, get notified on web or mobile.

Supporting Capabilities

Beyond the nine pillars, Jabot also includes:

Conversation & Agent Loop

Multi-turn streaming chat with visible thinking steps and tool progress (SSE).
Up to 20 tool-call iterations per request; lazy skill activation to control token cost.
Conversation history with automatic compression, session title generation, and full-text search.
Voice input — Chinese, English, auto-detect (upload or real-time WebSocket via Deepgram Nova-3).
Model selection per session (Claude, GPT, DeepSeek, Gemini, Groq, Zhipu, self-hosted vLLM, etc.).
Long-term user memory — preferences and facts persist across sessions; nightly consolidation job.

Cross-Platform Access

Web: Vue 3 PWA — installable on desktop and mobile home screen; dark / light theme.
Mobile: Flutter WebView shell with native bridge (haptics, secure storage, push notifications).
Service Worker offline cache.

Security & Quotas

JWT authentication (180-day tokens); brute-force protection; optional hCaptcha; beta whitelist mode.
Rate limiting (40–120 chat requests/min by tier); monthly token quotas (2M free / 20M Pro).
Session concurrency limits, upload size limits (30 MB per file), storage quotas.

Agent Skill System (16+ Bundled Skills)

Skills are tool bundles the AI activates on demand. Only core and communication load by default; the agent calls use_skill (or you type /skill-id) to load others.

Skill	Default	Tools	What it enables
`core`	✅	`use_skill`, `manage_user_memory`, `current_time`	Skill activation, long-term memory, timezone-aware time
`communication`	✅	`terminate`	Gracefully end the agent loop when done
`user_location`	—	`current_location`	Browser geolocation
`weather`	—	`geocoding`, `weather_api`, `air_quality`	Forecasts and air quality
`web`	—	`web_search_jacky`, `web_scraping`, `video_content_extract`	Search, page scrape, video transcripts
`navigation`	—	`geocoding`, `directions`, `place_details`, `route_traffic`, `set_map_preference`	Maps, routes, places, traffic
`notes`	—	`notes_save`, `notes_update`, `notes_search`, `notes_list_folders`	Save, edit, semantic search notes
`scheduling`	—	`reminder_manager`, `todo_manager`	Timed reminders (RRULE) and memo todos
`sandbox`	—	`sandbox_run_command`, `sandbox_write_file`, `sandbox_read_file`, `sandbox_patch_file`, `sandbox_list_dir`, `sandbox_load_from_cloud`, `sandbox_sync_to_cloud`	Code execution and file ops
`files`	—	`file_read`, `list_dir_cloud`, `resource_download`	Cloud disk access
`media`	—	`pexels_search`	Stock image search
`mindmap`	—	`generate_mindmap`	Mind map generation
`docx` / `pptx` / `xlsx` / `pdf`	—	(via sandbox)	Office & PDF create / edit / convert
`academic_research`	—	`academic_paper_search`, `format_citation`	Paper search and citations
`image_generate` / `image_edit`	—	background image tasks	AI image creation and editing
`animation_demo`	—	`animation_read_doc`, `animation_render_video`	HTML animation → MP4

Real-World Impact

My most complete personal project — it combines ideas from KnowFlow (knowledge capture), Easy Life Agent (life management), and enterprise patterns from Project Dashboard / Product Scanner into one production system deployed at jacky-info.com/jackybot/.

The design goal is not "another chatbot" but a daily-use personal intelligence layer: everything you learn, create, and ask stays connected — searchable, citable, and reusable tomorrow.

Skills Demonstrated

Area	What this project shows
AI product design	Multi-pillar UX (cloud + RAG + exploration + grounded search), skill system, memory
Full-stack development	Python/FastAPI + Vue 3 + Flutter
RAG & vector search	Voyage AI embedding, Qdrant semantic retrieval, relevance filtering
Media & content pipelines	Video transcript, mind map, animation render, image generation
Cloud & DevOps	Auth, quotas, rate limits, volume storage, Web Push (VAPID) + FCM
Integration	LLM, search, speech, maps, scraping, academic APIs, HyperFrames

How It Works (Plain English)

You interact via web or mobile — chat, quick tools, or resource hubs (cloud / notes / seeds).
The FastAPI backend authenticates, checks quotas, and starts an agent loop.
The agent selects skills, calls tools (search notes, scrape a video, run sandbox code, etc.), and streams results via SSE.
Anything worth keeping — notes, files, mind maps, generated documents — lands in your global cloud or RAG notes; timed reminders and memos persist in the Task Centre and notify via push when due.
Next time you ask, retrieval pulls your history + fresh web data + location context — answers get more personal and better sourced over time.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                        User Devices                          │
│         Web Browser (PWA)  ·  iOS/Android (Flutter shell)    │
└──────────────────────────────┬──────────────────────────────┘
                               │ HTTPS · SSE · WebSocket
┌──────────────────────────────▼──────────────────────────────┐
│                   AI Backend (Python / FastAPI)               │
│  ┌─────────────┐  ┌──────────────┐  ┌─────────────────────┐ │
│  │ Auth &      │  │ Agent loop   │  │ 16+ bundled skills   │ │
│  │ rate limits │→ │ (≤20 iter)   │→ │ (lazy activation)   │ │
│  └─────────────┘  └──────────────┘  └─────────────────────┘ │
└───────┬───────────────────────────────┬─────────────────────┘
        │                               │
   ┌────▼─────┐                    ┌────▼─────────────────────┐
   │ MongoDB  │                    │ External services         │
   │ users ·  │                    │ LLM · Deepgram · search  │
   │ chats ·  │                    │ maps · Voyage AI · poyo  │
   │ notes ·  │                    │ HyperFrames · downsub    │
   │ files    │                    └──────────────────────────┘
   └────┬─────┘
   ┌────▼─────┐     ┌──────────────────┐
   │ Qdrant   │     │ CloakBrowser      │
   │ (RAG)    │     │ (JS web scraping) │
   └──────────┘     └──────────────────┘

Tech Stack

Backend (Nanobot)

Python 3.11, FastAPI, LiteLLM, MongoDB, Qdrant, Railway Volume file storage
Deepgram Nova-3 STT, Voyage AI embeddings, Web Push (VAPID), HyperFrames + ffmpeg

Frontend

Vue 3, Vite 7, PWA, Marked + KaTeX + Markmap, jit-viewer, Amap/Google Maps

Mobile

Flutter WebView + window.JabotNative bridge, Firebase push

LLM Providers (via LiteLLM)

Anthropic, OpenAI, OpenRouter, DeepSeek, Groq, Zhipu, Gemini, self-hosted vLLM

One assistant where your knowledge, files, and tools stay connected — built to be useful every day.

Junchuan Zhang | Jacky (He/His)

In One Sentence

The Problem

Ten Core Capabilities

1. Global Cloud Drive — Upload Once, Reference Everywhere

2. Notes as a Growing RAG Knowledge Base

3. Daily Seeds & Note Sprouting — Deep Exploration Engine

Today's Seeds (每日种子)

Note Sprouting (笔记发芽)

4. Bilibili & YouTube — Full Content Parsing

5. Global Web Search with Source Citations

6. Real-Time Location, Maps & Weather

7. Full Document Editing & Academic Assistant

8. Sandbox & Code Execution

9. Image Generation, Editing & HTML→MP4 Animation

Image Generation & Editing

HTML Animation → MP4

10. Memos, Timed Reminders & Push Notifications

Two distinct modes — memos vs reminders

Reminder capabilities

Memo (todo) capabilities

Background schedule engine

Unified push — Web Push + native FCM

Supporting Capabilities

Conversation & Agent Loop

Cross-Platform Access

Security & Quotas

Agent Skill System (16+ Bundled Skills)

Real-World Impact

Skills Demonstrated

How It Works (Plain English)

Architecture

Tech Stack

Backend (Nanobot)

Frontend

Mobile

LLM Providers (via LiteLLM)