Zotero AI Sidebar
5 分钟上手 5-Minute Quick Start · 配置 → 提问 → 沉淀进 Zotero 笔记 · configure → ask → land it in your Zotero note
完整功能手册 → Full reference →
QUICK START · HOW TO USE

从零跑通一次 AI 阅读循环

Run the full AI reading loop from scratch

本页用 6 个步骤 × 12 张实景 mockup,把"配模型 → 选论文 → 提问 → 看上下文 → 写笔记 → PDF 逐句翻译"演完一遍。每步都给出编号操作步骤,mockup 直接复刻产品真实 sidebar(包括双层工具栏、状态栏、思考与上下文、操作按钮),所见即所装。

Six steps · twelve product-faithful mockups: configure → bind a paper → ask → audit context → save to note → PDF sentence translation. Every step ships with numbered operation steps; mockups mirror the real sidebar — two-row toolbar, composer status bar, thinking-context block, action buttons — what you see is what you'll get after installing the XPI.

STEP 01 配置第一个模型预设 Configure your first model preset
没有预设时 sidebar 直接进添加状态。测试连接 通过再保存,免得问到一半才发现 key 错了。 No preset = sidebar drops straight into the form. Always run Test connection before saving — fail loudly here.
  1. 在 Zotero 主窗口右下角条目面板切到 AI 标签,看到"还没有预设"卡片。
  2. + 添加预设,展开预设编辑表单。
  3. Provider(openai / anthropic),填一个好认的名称,粘贴 API Key(默认隐藏,点眼睛切换显示),并填 Base URL(官方端点或 OpenAI 兼容端点都可以)。
  4. Models 区域:从下方 OpenAI 预设模型 网格里点选要用的模型,或者在 输入自定义模型 框里写好后点 添加;每个 chip 都可以单独删除,顶部带边框的是当前默认模型。
  5. 调整 Max tokens(单次输出上限)与 Reasoning Summary(思考长度等级)。
  6. 测试连接,等绿勾出现再点 保存
  7. Open Zotero. The right-side item pane has an AI tab — switch to it; you'll see the "no preset" card.
  8. Click + Add preset to expand the editor.
  9. Pick Provider (openai / anthropic), give it a friendly name, paste your API Key (masked by default — click the eye to reveal), and set Base URL (official or any OpenAI-compatible endpoint).
  10. In Models: click model chips from the OpenAI presets grid, or type into the custom model box and hit Add; each chip is removable, the bordered one is the current default.
  11. Tune Max tokens (per-response cap) and Reasoning Summary level.
  12. Hit Test connection — wait for the green check, then Save.
1A 空预设状态 · 引导卡 Empty preset state 无 API key 即无法对话 · 设计如此 No key, no chat — by design
Zotero · Item Pane
Info
Notes
AI
还没有配置模型预设
No model preset yet
至少配置一个 Provider + API Key + Base URL + Model 才能开始对话。
Configure provider + API key + base URL + model id to start chatting.
+ 添加预设Add preset 从 JSON 导入Import JSON
1B 真实预设表单 · 模型 chip + 预设网格 Real preset form · chip list + presets grid 绿勾出现才点保存 Save only after the green check
Zotero · AI Sidebar · 设置
GPT (openai) GPT (openai) 删除Delete
openai
GPT (openai)
••••••••••••••••••••sk-Xq9F 👁
https://api.openai.com/v1
Models
gpt-5.5 × gpt-5 × gpt-4.1 × o3 ×
OpenAI 预设模型 OpenAI presets
输入自定义模型 ID…custom-model-id… 添加Add
4096
中(默认) Medium (default)
测试连接Test connection 保存Save 连接成功 · 320 msConnection OK · 320 ms
!
为什么 API Key 字段默认隐藏? Why is the API key masked? 它只存在本机 Zotero prefs。WebDAV 同步和配置导出都会主动剔除它,不会跟随云端走。换机时需要手动重填。 Keys live in local Zotero prefs only — WebDAV sync and config export both strip them. You'll re-enter on a new machine; that's intentional.
STEP 02 打开侧边栏并绑定到一篇论文 Open the sidebar & bind to a paper
主窗口选中 item 即绑定。聊天历史、上下文、笔记都按论文分别保存,切换论文不会污染另一篇的对话。 Selecting an item binds the sidebar. Chat history, context, and notes are kept per paper — switching papers never bleeds context.
  1. 在 Zotero 主窗口左侧文献列表里 选中目标论文
  2. 右侧条目面板切到 AI 标签,顶部出现 双层工具栏:
    第 1 行 — AI 对话 标签 · 当前预设(可点击切换) · 复制 MD · 清空
    第 2 行 — 打开笔记 · 设置 · · 隐藏 · 字号 13px · 调试 开关
  3. 下方 欢迎气泡 提示就绪。每篇论文一条线程,切换论文 = 切换会话。
  4. In the main window's items list, select the paper you want to read.
  5. Switch to the AI tab in the right item pane. A two-row toolbar appears:
    Row 1 — AI Chat · preset switcher · Copy MD · Clear
    Row 2 — Open Note · Settings · Translate · Hide · Font 13px · Debug toggle
  6. A welcome bubble confirms binding. One thread per paper — switching papers swaps the entire conversation.
2A 主窗口 · 选中目标论文 Main window · paper selected 选中即绑定 Selection = binding
Zotero — My Library
Library
  • 📚 My Library
  • 📁 VLA Papers
  • 📁 Robotics
  • 📁 Imitation
  • 🏷️ Tags
π
RT-2: Vision-Language-Action Models
2023
π
π0.5: a Vision-Language-Action Model
2025
π
OpenVLA: Open-Source VLA
2024
π
Octo: Open-Source Generalist Policy
2024
条目面板 · AI Item Pane · AI
π0.5: a Vision-Language-Action Model
Item 1241 · 2025
已就绪。配置模型预设后,可直接询问当前 Zotero 条目或 PDF 内容。 Ready. With a preset configured, ask about this item or its PDF directly.
2B 侧边栏 · 双层工具栏 · 就绪 Sidebar · two-row toolbar · ready 实景 1:1 复刻 1:1 product mirror
Zotero · AI Sidebar
AI 对话Chat GPT (openai) 复制 MDCopy MD 清空Clear
打开笔记Open Note 设置Settings Translate 隐藏Hide 字号Font 13px 默认default 调试Debug
π0.5: a Vision-Language-Action Model with Open-World Generalization
Item ID: 1241
AI AI

已绑定到 π0.5。可以直接问 — 我会按需查看摘要、PDF 全文、批注、当前选中片段。

Bound to π0.5. Ask anything — I'll fetch the abstract, full PDF, annotations, or your current selection on demand.

问点什么... (Enter 发送,Shift+Enter 换行)
Ask anything… (Enter to send, Shift+Enter for newline)
Ln 1, Col 1 📷 截图Screenshot 🖼 图片Image gpt-5.5 Reasoning · Low YOLO
STEP 03 输入第一个问题(选中片段、附图、状态栏) Compose the first question (selection · image · status bar)
在 Reader 选中文字会出现 橙色选中片段 chip;状态栏可看到光标位置、当前模型、Reasoning 等级、YOLO 开关。 Selecting PDF text raises an orange selection chip. The status bar shows caret pos · current model · reasoning level · YOLO toggle.
  1. 想直接问就把光标放到输入框,或者点上方的 快捷提示词(总结 / 解释方法 / 找数据)。
  2. 在 PDF Reader 里 选中一段文字,composer 顶部会出现橙色 选中片段 chip,把这段原文显式带给模型(可点 × 取消)。
  3. 想附图就点状态栏的 截图(截当前 PDF 视口)或 图片(从本地选);缩略图直接出现在输入框上方,可以点 × 撤掉。
  4. 输入问题时,状态栏左下显示 Ln · Col 当前光标位置,右下显示当前模型(gpt-5.5)、Reasoning 等级(Low / Medium / High)、YOLO 开关。
  5. Enter 发送,Shift+Enter 换行;发送后橙色 chip 自动消费掉。
  6. Either type into the composer or hit a quick prompt (Summarize / Explain method / Pull numbers).
  7. Select PDF text in the Reader — an orange selection chip appears at the top of the composer, attaching that exact passage to the next message (× to drop).
  8. Want an image? Click Screenshot (current viewport) or Image (file picker) in the status bar; the thumbnail appears above the textarea and can be removed.
  9. As you type, the status bar shows Ln · Col on the left, and on the right: current model (gpt-5.5), Reasoning level (Low/Medium/High), and the YOLO toggle.
  10. Press Enter to send, Shift+Enter for newline. The orange chip is consumed on send.
3A 空 composer · 状态栏完整 Empty composer · full status bar 底部状态栏 = 真实产品同款 Status bar = real product
Zotero · AI Sidebar
AI 对话Chat GPT (openai) 复制 MDCopy MD 清空Clear
打开笔记Open Note 设置Settings Translate 隐藏Hide 字号Font 13px 默认default 调试Debug
π0.5: a Vision-Language-Action Model with Open-World Generalization
Item ID: 1241
问点什么... (Enter 发送,Shift+Enter 换行)
Ask anything… (Enter to send, Shift+Enter for newline)
Ln 1, Col 1 📷 截图Screenshot 🖼 图片Image gpt-5.5 Reasoning · Low YOLO
3B 已输入 · 选中片段 + 附图 Typed · chip + image 橙色 = PDF 选区显式带上 Orange = selection attached
Zotero · AI Sidebar
AI 对话Chat GPT (openai) 复制 MDCopy MD 清空Clear
打开笔记Open Note 设置Settings Translate 隐藏Hide 字号Font 13px 默认default 调试Debug
π0.5: a Vision-Language-Action Model with Open-World Generalization
Item ID: 1241
📎 选中片段:“While most robotic learning systems collapse…” Selection: "While most robotic learning systems collapse…" ×
FIG 3×
帮我用 5 行总结这篇论文,并指出它的核心创新和最大局限。结合附图给出 Figure 3 的解读。
Summarize this paper in 5 lines, call out the core contribution and biggest limitation, and explain Figure 3.
Ln 2, Col 23 📷 截图Screenshot 🖼 图片Image gpt-5.5 Reasoning · High YOLO
STEP 04 看「思考与上下文」· 确认 AI 真的读了 PDF Read the "Thinking & context" block — did the AI actually read the PDF?
每条回答之前的 思考与上下文块 = 这一轮真实发送的内容。蓝色 spinner = 工具在跑;全绿 = 完成。 The "Thinking & context" block above every answer is exactly what was sent this turn. Blue spinner = tool in flight; all green = done.
  1. 发问之后,AI 气泡上方先出现 「思考与上下文」 标题,旁边写明本轮真实发送的内容(例如 · 已随本轮发送 PDF 全文 96241 字)。
  2. 下面用 橙色 chip 列出这一轮强制带上的上下文(PDF 全文、批注、选中片段、附图等)。
  3. 再下面是 工具调用列表:绿点 = 已完成;蓝色 spinner = 当前正在跑;每行包含工具名 + 入参摘要 + 字数。
  4. 流式过程中可以随时点 停止(composer 状态栏会切换);完成后整块默认折叠,点标题展开看模型推理摘要。
  5. 回答正文 = Markdown(支持标题、列表、代码块、引用、链接、加粗、行内代码);对应 trace 行可以单独展开看 input/output。
  6. After you send, the assistant bubble first shows a "Thinking & context" header with what was actually sent this turn (e.g. · Sent full PDF, 96241 chars).
  7. Below the header: orange chips for the forced context (PDF, annotations, selection, images).
  8. Then the tool-call list: green dot = finished, blue spinner = in flight; each row shows tool name · args · size.
  9. During streaming, hit Stop any time. When done, the whole block auto-collapses; click the header to read the reasoning summary.
  10. The answer body is rendered Markdown — headings, lists, code, quotes, links. Each trace row can also be expanded for raw input/output.
4A 流式中 · 「思考与上下文」展开 Streaming · "Thinking & context" expanded 蓝点 = 当前在跑 Blue dot = tool in flight
Zotero · AI Sidebar
AI 对话Chat GPT (openai) 复制 MDCopy MD 清空Clear
打开笔记Open Note 设置Settings Translate 隐藏Hide 字号Font 13px 默认default 调试Debug
π0.5: a Vision-Language-Action Model with Open-World Generalization
Item ID: 1241
把这篇论文的核心思路和实验主线整理出来,并把训练流程的关键句在 PDF 上做高亮。 Summarise the core idea and experiment storyline; highlight the key training-pipeline sentences on the PDF.
AI AI
思考与上下文 Thinking & context · 已随本轮发送 PDF 全文 79952 字 · Sent full PDF, 79952 chars
已随本轮发送 PDF 全文 79952 字Full PDF · 79952 chars sent this turn
zotero_get_current_item · 读取当前条目题录read item metadata
zotero_get_reader_pdf_text · 读取 Reader PDF 文本 79952/79952read Reader PDF text 79952/79952
zotero_annotate_passage · p.1 高亮highlight +23 chars
zotero_annotate_passage · p.1 高亮highlight +20 chars
zotero_annotate_passage · p.2 高亮highlight +19 chars
zotero_annotate_passage · p.8 高亮highlight
标注中 annotating · 正在写入 PDF 重点… · writing PDF highlights…
流式输出中…
Streaming…
停止Stop gpt-5.5 Reasoning · High YOLO
4B 完成 · Markdown 正文 Done · Markdown body 气泡下方常显操作按钮 Action buttons always visible
Zotero · AI Sidebar
AI 对话Chat GPT (openai) 复制 MDCopy MD 清空Clear
打开笔记Open Note 设置Settings Translate 隐藏Hide 字号Font 13px 默认default 调试Debug
π0.5: a Vision-Language-Action Model with Open-World Generalization
Item ID: 1241
AI AI
思考与上下文 Thinking & context · 已随本轮发送 PDF 全文 79952 字 · Sent full PDF, 79952 chars
已随本轮发送 PDF 全文 79952 字Full PDF · 79952 chars sent this turn
zotero_get_current_item · 读取当前条目题录read item metadata
zotero_get_reader_pdf_text · 读取 Reader PDF 文本 79952/79952read Reader PDF text 79952/79952
zotero_annotate_passage · p.1 高亮highlight +23 chars
zotero_annotate_passage · p.1 高亮highlight +20 chars
zotero_annotate_passage · p.2 高亮highlight +19 chars
zotero_annotate_passage · Passage not found in PDF (or low co…)Passage not found (low confidence)
zotero_annotate_passage · p.8 高亮highlight +18 chars
zotero_annotate_passage · p.11 高亮highlight +14 chars

已完成 PDF 重点标注,共保存 9 条;其中 1 条训练流程句连续两次未找到,已按你的规则放弃。

PDF highlights done — 9 saved. One training-pipeline sentence failed twice and was dropped per your rule.

摘要主线:这篇论文关注 VLA 机器人模型能否在实验室外的开放真实家庭环境中泛化。作者提出 π0.5,在 π0 基础上通过异构数据共同训练,把移动操作数据、其他机器人数据、高层语义子任务预测、语言指令和 Web 多模态数据结合起来。模型在推理时采用层级结构:先预测语义子任务,再生成低层动作块。实验显示,这种知识迁移对泛化很关键,π0.5 能在训练中未见过的新家庭里执行清理厨房、卧室等长时程、多阶段、灵巧操作任务。

Storyline. Whether VLA models generalize to open-world households outside the lab. π0.5 co-trains on heterogeneous data (mobile manipulation, other robots, high-level semantic sub-task prediction, language instructions, web multimodal). Inference uses a hierarchy — predict semantic sub-tasks, then low-level action chunks. Cross-source transfer is critical: π0.5 performs long-horizon, multi-stage, dexterous tasks (clean kitchen / bedroom) in unseen homes.

全文重点 已完成 Highlights · done 查看View
问点什么... (Enter 发送,Shift+Enter 换行)
Ask anything… (Enter to send, Shift+Enter for newline)
Ln 1, Col 1 📷 🖼 gpt-5.5 Reasoning · High YOLO
i
trace 全绿就一定可信吗? Is an all-green trace enough? 不一定。绿点表示工具调用成功,不保证模型正确使用了返回值。怀疑某条结论时,把光标停在对应工具行 → 展开输入/输出对比原文。 Not necessarily. Green means the tool succeeded, not that the model used its output correctly. When in doubt, expand the row to compare inputs/outputs against the source PDF.
STEP 05 手动「写入笔记」· 三栏布局边读边记 Click Write to note · three-column reading layout
主路径 = 点 AI 气泡下面的 写入笔记。需要边读边编辑就在工具栏点 打开笔记,出现 PDF | 笔记 | Sidebar 三栏。 Primary path = click Write to note below the bubble. Need to edit while reading? Hit Open Note for the three-column PDF | Note | Sidebar layout.
  1. 每条 AI 气泡下面都常显三个按钮:复制 / 写入笔记 / 删除
  2. 写入笔记(红框高亮的那个),回答自动追加到当前论文的子笔记;没有子笔记就自动新建一条。
  3. 想边看 PDF 边自己改写笔记?去 toolbar 第二行点 打开笔记,出现三列布局:左 PDF Reader + 中 笔记面板(带工具栏 / H2 / timestamp / 正文)+ 右 AI Sidebar
  4. 笔记面板用的是 Zotero 自己的富文本编辑器(没有 Markdown 源,直接编辑 — Enter / 列表 / 选区都按 Zotero 习惯走)。
  5. 想关掉笔记列就再次点 打开笔记(此时显示「已打开」),回到两栏。聊天状态、草稿、流式不会因开关笔记而丢。
  6. Each AI bubble always shows three buttons: Copy / Write to note / Delete.
  7. Click Write to note (the red-bordered one) — the answer appends to this paper's child note, creating one if missing.
  8. Want to read & edit at once? Hit Open Note in the second toolbar row. The view becomes three columns: PDF Reader | Note panel (toolbar / H2 / timestamp / body) | AI Sidebar.
  9. The note panel uses Zotero's native rich-text editor — no Markdown source, edit directly. Enter / lists / selection all behave the Zotero way.
  10. Click Open Note again (now labelled "Opened") to collapse back to two columns. Toggling never drops chat state, drafts, or streaming.
5A 点「写入笔记」· 红框高亮 Click Write to note · red box 主路径,一次到位 Primary path · one click
Zotero · AI Sidebar
AI 对话Chat GPT (openai) 复制 MDCopy MD 清空Clear
打开笔记Open Note 设置Settings Translate 隐藏Hide 字号Font 13px 默认default 调试Debug
π0.5: a Vision-Language-Action Model with Open-World Generalization
Item ID: 1241
AI AI
思考与上下文 Thinking & context · PDF 全文 79952 字 · 9 标注 · Full PDF 79952 · 9 annotations

已完成 PDF 重点标注,共保存 9 条;其中 1 条训练流程句连续两次未找到,已按你的规则放弃。

PDF highlights done — 9 saved. One sentence failed twice and was dropped per your rule.

摘要主线:π0.5 在 π0 基础上通过异构数据共同训练,把移动操作、其他机器人数据、高层语义子任务、语言指令和 Web 多模态数据结合起来。推理时先预测语义子任务,再生成低层动作块,知识迁移对泛化关键。

Storyline. π0.5 co-trains on heterogeneous data — mobile manipulation, other robots, high-level semantic sub-tasks, language, and web multimodal. Inference predicts sub-tasks then low-level action chunks; cross-source transfer is critical for generalization.

→ 自动追加到当前论文的子笔记(没有就新建) → Appends to this paper's child note (creates one if missing)
5B 三列布局 · PDF | 笔记 | Sidebar Three columns · PDF | Note | Sidebar 点「打开笔记」即出现 Triggered by Open Note
Zotero · PDF + 笔记 + AI Zotero · PDF + Note + AI
+ 3 / 18

π0.5: a VLA Model with Open-World Generalization

π0.5: a VLA Model with Open-World Generalization

摘要 — 我们提出 π0.5,一个面向开放世界泛化的视觉-语言-动作模型。

Abstract — we present π0.5, a Vision-Language-Action model targeting open-world generalization.

该方法在 12 个家庭场景中进行评估,涵盖新物体、新房间布局以及新指令组合。

The method is evaluated on 12 household scenes covering novel objects, layouts, and instruction compositions.

π0.5 · 阅读笔记 π0.5 · Reading notes
已保存Saved
B I U H2 1. <> Zotero 富文本 Zotero rich text

π0.5 阅读小结

π0.5 reading summary

2026-05-10 14:32 · GPT-5.5

核心结论。 π0.5 把开放世界泛化作为一等目标 — 不只是 in-distribution 刷分。

Core. π0.5 treats open-world generalization as first-class — not just in-distribution scoring.

方法亮点。 多家庭训练数据 + 推理时语言指令分解。

Method. Multi-home training + inference-time decomposition.

局限。 数据成本高,新家庭仍需更多 demo。

Limit. Data cost is steep; novel homes need more demos.

AI GPT 清空Clear
已打开Opened 设置Settings Trans 调试Debug
把刚才的总结追加进笔记里。 Append the summary to my note.
AI AI
思考与上下文 Thinking & context · 1 工具 · 1 tool

✓ 已追加到「π0.5 · 阅读笔记」(方法亮点 / 局限 两段)。

✓ Appended to π0.5 reading notes (Method & Limit).

三栏会不会挤? Is three columns too tight? 不需要常驻 — 用完了再次点「已打开」就回到两栏。聊天草稿、流式状态、操作按钮、preset 选择不会因笔记列开关而丢失。 Treat it as toggleable — click "Opened" to collapse back. Chat drafts, streaming, action buttons, preset selection all survive the toggle.
STEP 06 PDF 逐句翻译模式 PDF sentence translation mode
在 Reader 顶部点 开启 — 点句子即在原文旁出译文,Enter / Shift+Enter 在句子之间穿行。 Toggle Translate in the Reader; click a sentence to translate it in-place. Enter / Shift+Enter walk through.
  1. 先去设置 → 翻译,挑 触发方式(单击 / 双击)、浮层位置(句子下方 / 上方)、浮层尺寸(紧凑 / 自适应)。
  2. 上下文范围 — 仅句子(最快)/ 本段(语境感)/ 整页(最贵但最准)。
  3. 设置 句子导航 快捷键 — 默认 Enter 下一句、Shift+Enter 上一句。
  4. 回到 PDF Reader 顶部点 开启模式(按钮变蓝高亮)。
  5. 点击任意句子 → 该句被高亮 → 译文浮层立刻出现在指定位置;按 Enter 跳到下一句,继续阅读。
  6. Settings → Translation: pick trigger (single / double click), placement (below / above), and size (compact / adaptive).
  7. Pick context scope — sentence only (fastest) / paragraph (better context) / page (most accurate, costliest).
  8. Set navigation shortcuts — default Enter = next sentence, Shift+Enter = previous.
  9. Open a PDF, click Translate at the top of the Reader to enter the mode (button turns blue).
  10. Click any sentence — it highlights and a translation overlay pops up at the configured spot. Press Enter to walk forward.
6A 翻译设置 · 触发 / 位置 / 上下文 / 快捷键 Translation settings · trigger / position / context / keys 设置 → 翻译 Settings → Translation
Zotero · AI Sidebar 设置 · 翻译 Zotero · AI Sidebar Settings · Translation

翻译 Translation

触发方式Trigger
单击Single click 双击Double click
浮层显示Overlay
句子下方Below 句子上方Above
紧凑Compact 自适应Adaptive
上下文范围Context scope
仅句子Sentence only 本段Paragraph 整页Whole page
句子导航Navigation
下一句Next Enter
上一句Previous Shift + Enter
退出模式Exit mode Esc
6B PDF · 译模式 · 句子高亮 + 浮层 PDF · translate mode · sentence + overlay 点句即译 Click → translate
Zotero · PDF Reader
+ Translate 3 / 18

π0.5: a VLA Model with Open-World Generalization

π0.5: a VLA Model with Open-World Generalization

Most robotic learning systems are trained and evaluated within the same distribution. However, real-world deployment requires generalization to new objects, layouts, and instructions that were never seen during training.

译文 · 本段上下文Translation · paragraph context
然而,真实世界的部署要求模型能够泛化到训练时从未见过的新物体、新布局和新指令。
However, real-world deployment requires generalization to new objects, layouts, and instructions never seen at training time.

In this work, we present π0.5, a vision-language-action model that explicitly targets open-world generalization through multi-home training data and inference-time language decomposition. We evaluate π0.5 on twelve novel household scenes covering unseen objects and layouts.

译模式和 AI 对话冲突吗? Does Translate mode clash with AI Chat? 不冲突。译模式只占 PDF Reader,Sidebar 仍可正常对话。需要让 AI 整段引用译文时,先关掉译模式 → 选中目标段 → 自动出现在 composer 的橙色 chip 里。 They don't. Translate lives in the PDF Reader; the Sidebar keeps working. To pipe a passage into chat, exit Translate mode, select the text — it lands in the orange chip on the composer.