Zotero AI Sidebar · 5-Minute Quick Start

STEP 01 配置第一个模型预设 Configure your first model preset

没有预设时 sidebar 直接进添加状态。测试连接 通过再保存,免得问到一半才发现 key 错了。 No preset = sidebar drops straight into the form. Always run Test connection before saving — fail loudly here.

在 Zotero 主窗口右下角条目面板切到 AI 标签,看到"还没有预设"卡片。
点 + 添加预设,展开预设编辑表单。
选 Provider(openai / anthropic),填一个好认的名称,粘贴 API Key(默认隐藏,点眼睛切换显示),并填 Base URL(官方端点或 OpenAI 兼容端点都可以)。
在 Models 区域:从下方 OpenAI 预设模型 网格里点选要用的模型,或者在 输入自定义模型 框里写好后点添加;每个 chip 都可以单独删除,顶部带边框的是当前默认模型。
调整 Max tokens(单次输出上限)与 Reasoning Summary(思考长度等级)。
点 测试连接,等绿勾出现再点保存。
Open Zotero. The right-side item pane has an AI tab — switch to it; you'll see the "no preset" card.
Click + Add preset to expand the editor.
Pick Provider (openai / anthropic), give it a friendly name, paste your API Key (masked by default — click the eye to reveal), and set Base URL (official or any OpenAI-compatible endpoint).
In Models: click model chips from the OpenAI presets grid, or type into the custom model box and hit Add; each chip is removable, the bordered one is the current default.
Tune Max tokens (per-response cap) and Reasoning Summary level.
Hit Test connection — wait for the green check, then Save.

1A 空预设状态 · 引导卡 Empty preset state 无 API key 即无法对话 · 设计如此 No key, no chat — by design

Zotero · Item Pane

Info

Notes

AI

还没有配置模型预设

No model preset yet

至少配置一个 Provider + API Key + Base URL + Model 才能开始对话。

Configure provider + API key + base URL + model id to start chatting.

+ 添加预设Add preset 从 JSON 导入Import JSON

1B 真实预设表单 · 模型 chip + 预设网格 Real preset form · chip list + presets grid 绿勾出现才点保存 Save only after the green check

Zotero · AI Sidebar · 设置

GPT (openai) GPT (openai) 删除Delete

Provider

openai

名称Name

GPT (openai)

API Key

••••••••••••••••••••sk-Xq9F 👁

Base URL

https://api.openai.com/v1

Models

gpt-5.5 × gpt-5 × gpt-4.1 × o3 ×

OpenAI 预设模型 OpenAI presets

输入自定义模型 ID…custom-model-id… 添加Add

Max tokens

4096

Reasoning

中(默认) Medium (default)

测试连接Test connection 保存Save ✓ 连接成功 · 320 msConnection OK · 320 ms

!

为什么 API Key 字段默认隐藏? Why is the API key masked? 它只存在本机 Zotero prefs。WebDAV 同步和配置导出都会主动剔除它,不会跟随云端走。换机时需要手动重填。 Keys live in local Zotero prefs only — WebDAV sync and config export both strip them. You'll re-enter on a new machine; that's intentional.

STEP 02 打开侧边栏并绑定到一篇论文 Open the sidebar & bind to a paper

主窗口选中 item 即绑定。聊天历史、上下文、笔记都按论文分别保存,切换论文不会污染另一篇的对话。 Selecting an item binds the sidebar. Chat history, context, and notes are kept per paper — switching papers never bleeds context.

在 Zotero 主窗口左侧文献列表里 选中目标论文。
右侧条目面板切到 AI 标签,顶部出现 双层工具栏:
第 1 行 — AI 对话 标签 · 当前预设(可点击切换) · 复制 MD · 清空
第 2 行 — 打开笔记 · 设置 · 译 · 隐藏 · 字号 13px · 调试 开关
下方 欢迎气泡 提示就绪。每篇论文一条线程,切换论文 = 切换会话。
In the main window's items list, select the paper you want to read.
Switch to the AI tab in the right item pane. A two-row toolbar appears:
Row 1 — AI Chat · preset switcher · Copy MD · Clear
Row 2 — Open Note · Settings · Translate · Hide · Font 13px · Debug toggle
A welcome bubble confirms binding. One thread per paper — switching papers swaps the entire conversation.

2A 主窗口 · 选中目标论文 Main window · paper selected 选中即绑定 Selection = binding

Zotero — My Library

Library

📚 My Library
📁 VLA Papers
📁 Robotics
📁 Imitation
🏷️ Tags

π

RT-2: Vision-Language-Action Models

2023

π

π0.5: a Vision-Language-Action Model

2025

π

OpenVLA: Open-Source VLA

2024

π

Octo: Open-Source Generalist Policy

2024

条目面板 · AI Item Pane · AI

π0.5: a Vision-Language-Action Model

Item 1241 · 2025

已就绪。配置模型预设后,可直接询问当前 Zotero 条目或 PDF 内容。 Ready. With a preset configured, ask about this item or its PDF directly.

2B 侧边栏 · 双层工具栏 · 就绪 Sidebar · two-row toolbar · ready 实景 1:1 复刻 1:1 product mirror

Zotero · AI Sidebar

π0.5: a Vision-Language-Action Model with Open-World Generalization

Item ID: 1241

AI AI

已绑定到 π0.5。可以直接问 — 我会按需查看摘要、PDF 全文、批注、当前选中片段。

Bound to π0.5. Ask anything — I'll fetch the abstract, full PDF, annotations, or your current selection on demand.

问点什么... (Enter 发送,Shift+Enter 换行)

Ask anything… (Enter to send, Shift+Enter for newline)

Ln 1, Col 1 📷 截图Screenshot 🖼 图片Image gpt-5.5 Reasoning · Low YOLO

STEP 03 输入第一个问题(选中片段、附图、状态栏) Compose the first question (selection · image · status bar)

在 Reader 选中文字会出现 橙色选中片段 chip;状态栏可看到光标位置、当前模型、Reasoning 等级、YOLO 开关。 Selecting PDF text raises an orange selection chip. The status bar shows caret pos · current model · reasoning level · YOLO toggle.

想直接问就把光标放到输入框,或者点上方的 快捷提示词(总结 / 解释方法 / 找数据)。
在 PDF Reader 里 选中一段文字,composer 顶部会出现橙色 选中片段 chip,把这段原文显式带给模型(可点 × 取消)。
想附图就点状态栏的截图(截当前 PDF 视口)或图片(从本地选);缩略图直接出现在输入框上方,可以点 × 撤掉。
输入问题时,状态栏左下显示 Ln · Col 当前光标位置,右下显示当前模型(gpt-5.5)、Reasoning 等级(Low / Medium / High)、YOLO 开关。
按 Enter 发送,Shift+Enter 换行;发送后橙色 chip 自动消费掉。
Either type into the composer or hit a quick prompt (Summarize / Explain method / Pull numbers).
Select PDF text in the Reader — an orange selection chip appears at the top of the composer, attaching that exact passage to the next message (× to drop).
Want an image? Click Screenshot (current viewport) or Image (file picker) in the status bar; the thumbnail appears above the textarea and can be removed.
As you type, the status bar shows Ln · Col on the left, and on the right: current model (gpt-5.5), Reasoning level (Low/Medium/High), and the YOLO toggle.
Press Enter to send, Shift+Enter for newline. The orange chip is consumed on send.

3A 空 composer · 状态栏完整 Empty composer · full status bar 底部状态栏 = 真实产品同款 Status bar = real product

Zotero · AI Sidebar

π0.5: a Vision-Language-Action Model with Open-World Generalization

Item ID: 1241

问点什么... (Enter 发送,Shift+Enter 换行)

Ask anything… (Enter to send, Shift+Enter for newline)

Ln 1, Col 1 📷 截图Screenshot 🖼 图片Image gpt-5.5 Reasoning · Low YOLO

3B 已输入 · 选中片段 + 附图 Typed · chip + image 橙色 = PDF 选区显式带上 Orange = selection attached

Zotero · AI Sidebar

π0.5: a Vision-Language-Action Model with Open-World Generalization

Item ID: 1241

📎 选中片段:“While most robotic learning systems collapse…” Selection: "While most robotic learning systems collapse…" ×

FIG 3×

帮我用 5 行总结这篇论文,并指出它的核心创新和最大局限。结合附图给出 Figure 3 的解读。

Summarize this paper in 5 lines, call out the core contribution and biggest limitation, and explain Figure 3.

Ln 2, Col 23 📷 截图Screenshot 🖼 图片Image gpt-5.5 Reasoning · High YOLO

STEP 04 看「思考与上下文」· 确认 AI 真的读了 PDF Read the "Thinking & context" block — did the AI actually read the PDF?

每条回答之前的 思考与上下文块 = 这一轮真实发送的内容。蓝色 spinner = 工具在跑;全绿 = 完成。 The "Thinking & context" block above every answer is exactly what was sent this turn. Blue spinner = tool in flight; all green = done.

发问之后,AI 气泡上方先出现 「思考与上下文」 标题,旁边写明本轮真实发送的内容(例如 · 已随本轮发送 PDF 全文 96241 字)。
下面用 橙色 chip 列出这一轮强制带上的上下文(PDF 全文、批注、选中片段、附图等)。
再下面是 工具调用列表:绿点 = 已完成;蓝色 spinner = 当前正在跑;每行包含工具名 + 入参摘要 + 字数。
流式过程中可以随时点停止(composer 状态栏会切换);完成后整块默认折叠,点标题展开看模型推理摘要。
回答正文 = Markdown(支持标题、列表、代码块、引用、链接、加粗、行内代码);对应 trace 行可以单独展开看 input/output。
After you send, the assistant bubble first shows a "Thinking & context" header with what was actually sent this turn (e.g. · Sent full PDF, 96241 chars).
Below the header: orange chips for the forced context (PDF, annotations, selection, images).
Then the tool-call list: green dot = finished, blue spinner = in flight; each row shows tool name · args · size.
During streaming, hit Stop any time. When done, the whole block auto-collapses; click the header to read the reasoning summary.
The answer body is rendered Markdown — headings, lists, code, quotes, links. Each trace row can also be expanded for raw input/output.

4A 流式中 · 「思考与上下文」展开 Streaming · "Thinking & context" expanded 蓝点 = 当前在跑 Blue dot = tool in flight

Zotero · AI Sidebar

π0.5: a Vision-Language-Action Model with Open-World Generalization

Item ID: 1241

把这篇论文的核心思路和实验主线整理出来,并把训练流程的关键句在 PDF 上做高亮。 Summarise the core idea and experiment storyline; highlight the key training-pipeline sentences on the PDF.

AI AI

思考与上下文 Thinking & context · 已随本轮发送 PDF 全文 79952 字 · Sent full PDF, 79952 chars

已随本轮发送 PDF 全文 79952 字Full PDF · 79952 chars sent this turn

zotero_get_current_item · 读取当前条目题录read item metadata

zotero_get_reader_pdf_text · 读取 Reader PDF 文本 79952/79952read Reader PDF text 79952/79952

zotero_annotate_passage · p.1 高亮highlight +23字 chars

zotero_annotate_passage · p.1 高亮highlight +20字 chars

zotero_annotate_passage · p.2 高亮highlight +19字 chars

zotero_annotate_passage · p.8 高亮highlight …

标注中 annotating · 正在写入 PDF 重点… · writing PDF highlights…

流式输出中…

Streaming…

⏹ 停止Stop gpt-5.5 Reasoning · High YOLO

4B 完成 · Markdown 正文 Done · Markdown body 气泡下方常显操作按钮 Action buttons always visible

Zotero · AI Sidebar

π0.5: a Vision-Language-Action Model with Open-World Generalization

Item ID: 1241

AI AI

思考与上下文 Thinking & context · 已随本轮发送 PDF 全文 79952 字 · Sent full PDF, 79952 chars

已随本轮发送 PDF 全文 79952 字Full PDF · 79952 chars sent this turn

zotero_get_current_item · 读取当前条目题录read item metadata

zotero_get_reader_pdf_text · 读取 Reader PDF 文本 79952/79952read Reader PDF text 79952/79952

zotero_annotate_passage · p.1 高亮highlight +23字 chars

zotero_annotate_passage · p.1 高亮highlight +20字 chars

zotero_annotate_passage · p.2 高亮highlight +19字 chars

zotero_annotate_passage · Passage not found in PDF (or low co…)Passage not found (low confidence)

zotero_annotate_passage · p.8 高亮highlight +18字 chars

zotero_annotate_passage · p.11 高亮highlight +14字 chars

已完成 PDF 重点标注,共保存 9 条;其中 1 条训练流程句连续两次未找到,已按你的规则放弃。

PDF highlights done — 9 saved. One training-pipeline sentence failed twice and was dropped per your rule.

摘要主线:这篇论文关注 VLA 机器人模型能否在实验室外的开放真实家庭环境中泛化。作者提出 π0.5,在 π0 基础上通过异构数据共同训练,把移动操作数据、其他机器人数据、高层语义子任务预测、语言指令和 Web 多模态数据结合起来。模型在推理时采用层级结构:先预测语义子任务,再生成低层动作块。实验显示,这种知识迁移对泛化很关键,π0.5 能在训练中未见过的新家庭里执行清理厨房、卧室等长时程、多阶段、灵巧操作任务。

Storyline. Whether VLA models generalize to open-world households outside the lab. π0.5 co-trains on heterogeneous data (mobile manipulation, other robots, high-level semantic sub-task prediction, language instructions, web multimodal). Inference uses a hierarchy — predict semantic sub-tasks, then low-level action chunks. Cross-source transfer is critical: π0.5 performs long-horizon, multi-stage, dexterous tasks (clean kitchen / bedroom) in unseen homes.

全文重点已完成 Highlights · done 查看View

问点什么... (Enter 发送,Shift+Enter 换行)

Ask anything… (Enter to send, Shift+Enter for newline)

Ln 1, Col 1 📷 🖼 gpt-5.5 Reasoning · High YOLO

i

trace 全绿就一定可信吗? Is an all-green trace enough? 不一定。绿点表示工具调用成功,不保证模型正确使用了返回值。怀疑某条结论时,把光标停在对应工具行 → 展开输入/输出对比原文。 Not necessarily. Green means the tool succeeded, not that the model used its output correctly. When in doubt, expand the row to compare inputs/outputs against the source PDF.

STEP 05 手动「写入笔记」· 三栏布局边读边记 Click Write to note · three-column reading layout

主路径 = 点 AI 气泡下面的 写入笔记。需要边读边编辑就在工具栏点 打开笔记,出现 PDF | 笔记 | Sidebar 三栏。 Primary path = click Write to note below the bubble. Need to edit while reading? Hit Open Note for the three-column PDF | Note | Sidebar layout.

每条 AI 气泡下面都常显三个按钮:复制 / 写入笔记 / 删除。
点 写入笔记(红框高亮的那个),回答自动追加到当前论文的子笔记;没有子笔记就自动新建一条。
想边看 PDF 边自己改写笔记?去 toolbar 第二行点 打开笔记,出现三列布局:左 PDF Reader + 中 笔记面板(带工具栏 / H2 / timestamp / 正文)+ 右 AI Sidebar。
笔记面板用的是 Zotero 自己的富文本编辑器(没有 Markdown 源,直接编辑 — Enter / 列表 / 选区都按 Zotero 习惯走)。
想关掉笔记列就再次点 打开笔记(此时显示「已打开」),回到两栏。聊天状态、草稿、流式不会因开关笔记而丢。
Each AI bubble always shows three buttons: Copy / Write to note / Delete.
Click Write to note (the red-bordered one) — the answer appends to this paper's child note, creating one if missing.
Want to read & edit at once? Hit Open Note in the second toolbar row. The view becomes three columns: PDF Reader | Note panel (toolbar / H2 / timestamp / body) | AI Sidebar.
The note panel uses Zotero's native rich-text editor — no Markdown source, edit directly. Enter / lists / selection all behave the Zotero way.
Click Open Note again (now labelled "Opened") to collapse back to two columns. Toggling never drops chat state, drafts, or streaming.

5A 点「写入笔记」· 红框高亮 Click Write to note · red box 主路径,一次到位 Primary path · one click

Zotero · AI Sidebar

π0.5: a Vision-Language-Action Model with Open-World Generalization

Item ID: 1241

AI AI

思考与上下文 Thinking & context · PDF 全文 79952 字 · 9 标注 · Full PDF 79952 · 9 annotations

已完成 PDF 重点标注,共保存 9 条;其中 1 条训练流程句连续两次未找到,已按你的规则放弃。

PDF highlights done — 9 saved. One sentence failed twice and was dropped per your rule.

摘要主线:π0.5 在 π0 基础上通过异构数据共同训练,把移动操作、其他机器人数据、高层语义子任务、语言指令和 Web 多模态数据结合起来。推理时先预测语义子任务,再生成低层动作块,知识迁移对泛化关键。

Storyline. π0.5 co-trains on heterogeneous data — mobile manipulation, other robots, high-level semantic sub-tasks, language, and web multimodal. Inference predicts sub-tasks then low-level action chunks; cross-source transfer is critical for generalization.

→ 自动追加到当前论文的子笔记(没有就新建) → Appends to this paper's child note (creates one if missing)

5B 三列布局 · PDF | 笔记 | Sidebar Three columns · PDF | Note | Sidebar 点「打开笔记」即出现 Triggered by Open Note

Zotero · PDF + 笔记 + AI Zotero · PDF + Note + AI

π0.5: a VLA Model with Open-World Generalization

摘要 — 我们提出 π0.5,一个面向开放世界泛化的视觉-语言-动作模型。

Abstract — we present π0.5, a Vision-Language-Action model targeting open-world generalization.

该方法在 12 个家庭场景中进行评估,涵盖新物体、新房间布局以及新指令组合。

The method is evaluated on 12 household scenes covering novel objects, layouts, and instruction compositions.

π0.5 · 阅读笔记 π0.5 · Reading notes

✓ 已保存Saved

B I U H2 • 1. ” <> Zotero 富文本 Zotero rich text

π0.5 阅读小结

π0.5 reading summary

2026-05-10 14:32 · GPT-5.5

核心结论。 π0.5 把开放世界泛化作为一等目标 — 不只是 in-distribution 刷分。

Core. π0.5 treats open-world generalization as first-class — not just in-distribution scoring.

方法亮点。 多家庭训练数据 + 推理时语言指令分解。

Method. Multi-home training + inference-time decomposition.

局限。 数据成本高,新家庭仍需更多 demo。

Limit. Data cost is steep; novel homes need more demos.

把刚才的总结追加进笔记里。 Append the summary to my note.

AI AI

思考与上下文 Thinking & context · 1 工具 · 1 tool

✓ 已追加到「π0.5 · 阅读笔记」(方法亮点 / 局限 两段)。

✓ Appended to π0.5 reading notes (Method & Limit).

→

三栏会不会挤? Is three columns too tight? 不需要常驻 — 用完了再次点「已打开」就回到两栏。聊天草稿、流式状态、操作按钮、preset 选择不会因笔记列开关而丢失。 Treat it as toggleable — click "Opened" to collapse back. Chat drafts, streaming, action buttons, preset selection all survive the toggle.

STEP 06 PDF 逐句翻译模式 PDF sentence translation mode

在 Reader 顶部点译开启 — 点句子即在原文旁出译文,Enter / Shift+Enter 在句子之间穿行。 Toggle Translate in the Reader; click a sentence to translate it in-place. Enter / Shift+Enter walk through.

先去设置 → 翻译,挑 触发方式(单击 / 双击)、浮层位置(句子下方 / 上方)、浮层尺寸(紧凑 / 自适应)。
选 上下文范围 — 仅句子(最快)/ 本段(语境感)/ 整页(最贵但最准)。
设置 句子导航 快捷键 — 默认 Enter 下一句、Shift+Enter 上一句。
回到 PDF Reader 顶部点译开启模式(按钮变蓝高亮)。
点击任意句子 → 该句被高亮 → 译文浮层立刻出现在指定位置;按 Enter 跳到下一句,继续阅读。
Settings → Translation: pick trigger (single / double click), placement (below / above), and size (compact / adaptive).
Pick context scope — sentence only (fastest) / paragraph (better context) / page (most accurate, costliest).
Set navigation shortcuts — default Enter = next sentence, Shift+Enter = previous.
Open a PDF, click Translate at the top of the Reader to enter the mode (button turns blue).
Click any sentence — it highlights and a translation overlay pops up at the configured spot. Press Enter to walk forward.

6A 翻译设置 · 触发 / 位置 / 上下文 / 快捷键 Translation settings · trigger / position / context / keys 设置 → 翻译 Settings → Translation

Zotero · AI Sidebar 设置 · 翻译 Zotero · AI Sidebar Settings · Translation

翻译 Translation

触发方式Trigger

单击Single click 双击Double click

浮层显示Overlay

句子下方Below 句子上方Above

紧凑Compact 自适应Adaptive

上下文范围Context scope

仅句子Sentence only 本段Paragraph 整页Whole page

句子导航Navigation

下一句Next Enter

上一句Previous Shift + Enter

退出模式Exit mode Esc

6B PDF · 译模式 · 句子高亮 + 浮层 PDF · translate mode · sentence + overlay 点句即译 Click → translate

Zotero · PDF Reader

π0.5: a VLA Model with Open-World Generalization

Most robotic learning systems are trained and evaluated within the same distribution. However, real-world deployment requires generalization to new objects, layouts, and instructions that were never seen during training.

译文 · 本段上下文Translation · paragraph context

然而,真实世界的部署要求模型能够泛化到训练时从未见过的新物体、新布局和新指令。

However, real-world deployment requires generalization to new objects, layouts, and instructions never seen at training time.

Enter ▸ 下一句next · Shift+Enter ◂ 上一句prev · Esc 退出exit

In this work, we present π0.5, a vision-language-action model that explicitly targets open-world generalization through multi-home training data and inference-time language decomposition. We evaluate π0.5 on twelve novel household scenes covering unseen objects and layouts.

⌘

译模式和 AI 对话冲突吗? Does Translate mode clash with AI Chat? 不冲突。译模式只占 PDF Reader,Sidebar 仍可正常对话。需要让 AI 整段引用译文时,先关掉译模式 → 选中目标段 → 自动出现在 composer 的橙色 chip 里。 They don't. Translate lives in the PDF Reader; the Sidebar keeps working. To pipe a passage into chat, exit Translate mode, select the text — it lands in the orange chip on the composer.

从零跑通一次 AI 阅读循环

Run the full AI reading loop from scratch

π0.5: a VLA Model with Open-World Generalization

π0.5: a VLA Model with Open-World Generalization

π0.5 阅读小结

π0.5 reading summary

翻译 Translation

π0.5: a VLA Model with Open-World Generalization

π0.5: a VLA Model with Open-World Generalization