从零跑通一次 AI 阅读循环
Run the full AI reading loop from scratch
本页用 6 个步骤 × 12 张实景 mockup,把"配模型 → 选论文 → 提问 → 看上下文 → 写笔记 → PDF 逐句翻译"演完一遍。每步都给出编号操作步骤,mockup 直接复刻产品真实 sidebar(包括双层工具栏、状态栏、思考与上下文、操作按钮),所见即所装。
Six steps · twelve product-faithful mockups: configure → bind a paper → ask → audit context → save to note → PDF sentence translation. Every step ships with numbered operation steps; mockups mirror the real sidebar — two-row toolbar, composer status bar, thinking-context block, action buttons — what you see is what you'll get after installing the XPI.
- 在 Zotero 主窗口右下角条目面板切到 AI 标签,看到"还没有预设"卡片。
- 点 + 添加预设,展开预设编辑表单。
- 选 Provider(
openai/anthropic),填一个好认的名称,粘贴 API Key(默认隐藏,点眼睛切换显示),并填 Base URL(官方端点或 OpenAI 兼容端点都可以)。 - 在 Models 区域:从下方 OpenAI 预设模型 网格里点选要用的模型,或者在 输入自定义模型 框里写好后点 添加;每个 chip 都可以单独删除,顶部带边框的是当前默认模型。
- 调整 Max tokens(单次输出上限)与 Reasoning Summary(思考长度等级)。
- 点 测试连接,等绿勾出现再点 保存。
- Open Zotero. The right-side item pane has an AI tab — switch to it; you'll see the "no preset" card.
- Click + Add preset to expand the editor.
- Pick Provider (
openai/anthropic), give it a friendly name, paste your API Key (masked by default — click the eye to reveal), and set Base URL (official or any OpenAI-compatible endpoint). - In Models: click model chips from the OpenAI presets grid, or type into the custom model box and hit Add; each chip is removable, the bordered one is the current default.
- Tune Max tokens (per-response cap) and Reasoning Summary level.
- Hit Test connection — wait for the green check, then Save.
- 在 Zotero 主窗口左侧文献列表里 选中目标论文。
- 右侧条目面板切到 AI 标签,顶部出现 双层工具栏:
第 1 行 —AI 对话标签 ·当前预设(可点击切换)·复制 MD·清空
第 2 行 —打开笔记·设置·译·隐藏·字号 13px·调试开关 - 下方 欢迎气泡 提示就绪。每篇论文一条线程,切换论文 = 切换会话。
- In the main window's items list, select the paper you want to read.
- Switch to the AI tab in the right item pane. A two-row toolbar appears:
Row 1 —AI Chat·preset switcher·Copy MD·Clear
Row 2 —Open Note·Settings·Translate·Hide·Font 13px·Debugtoggle - A welcome bubble confirms binding. One thread per paper — switching papers swaps the entire conversation.
- 📚 My Library
- 📁 VLA Papers
- 📁 Robotics
- 📁 Imitation
- 🏷️ Tags
- 想直接问就把光标放到输入框,或者点上方的 快捷提示词(总结 / 解释方法 / 找数据)。
- 在 PDF Reader 里 选中一段文字,composer 顶部会出现橙色
选中片段 chip,把这段原文显式带给模型(可点 × 取消)。 - 想附图就点状态栏的 截图(截当前 PDF 视口)或 图片(从本地选);缩略图直接出现在输入框上方,可以点 × 撤掉。
- 输入问题时,状态栏左下显示
Ln · Col当前光标位置,右下显示当前模型(gpt-5.5)、Reasoning 等级(Low / Medium / High)、YOLO 开关。 - 按 Enter 发送,Shift+Enter 换行;发送后橙色 chip 自动消费掉。
- Either type into the composer or hit a quick prompt (Summarize / Explain method / Pull numbers).
- Select PDF text in the Reader — an orange
selection chipappears at the top of the composer, attaching that exact passage to the next message (× to drop). - Want an image? Click Screenshot (current viewport) or Image (file picker) in the status bar; the thumbnail appears above the textarea and can be removed.
- As you type, the status bar shows
Ln · Colon the left, and on the right: current model (gpt-5.5), Reasoning level (Low/Medium/High), and the YOLO toggle. - Press Enter to send, Shift+Enter for newline. The orange chip is consumed on send.
- 发问之后,AI 气泡上方先出现 「思考与上下文」 标题,旁边写明本轮真实发送的内容(例如 · 已随本轮发送 PDF 全文 96241 字)。
- 下面用 橙色 chip 列出这一轮强制带上的上下文(PDF 全文、批注、选中片段、附图等)。
- 再下面是 工具调用列表:绿点 = 已完成;蓝色 spinner = 当前正在跑;每行包含工具名 + 入参摘要 + 字数。
- 流式过程中可以随时点 停止(composer 状态栏会切换);完成后整块默认折叠,点标题展开看模型推理摘要。
- 回答正文 = Markdown(支持标题、列表、代码块、引用、链接、加粗、行内代码);对应 trace 行可以单独展开看 input/output。
- After you send, the assistant bubble first shows a "Thinking & context" header with what was actually sent this turn (e.g. · Sent full PDF, 96241 chars).
- Below the header: orange chips for the forced context (PDF, annotations, selection, images).
- Then the tool-call list: green dot = finished, blue spinner = in flight; each row shows tool name · args · size.
- During streaming, hit Stop any time. When done, the whole block auto-collapses; click the header to read the reasoning summary.
- The answer body is rendered Markdown — headings, lists, code, quotes, links. Each trace row can also be expanded for raw input/output.
- 每条 AI 气泡下面都常显三个按钮:
复制/写入笔记/删除。 - 点 写入笔记(红框高亮的那个),回答自动追加到当前论文的子笔记;没有子笔记就自动新建一条。
- 想边看 PDF 边自己改写笔记?去 toolbar 第二行点 打开笔记,出现三列布局:左 PDF Reader + 中 笔记面板(带工具栏 / H2 / timestamp / 正文)+ 右 AI Sidebar。
- 笔记面板用的是 Zotero 自己的富文本编辑器(没有 Markdown 源,直接编辑 — Enter / 列表 / 选区都按 Zotero 习惯走)。
- 想关掉笔记列就再次点 打开笔记(此时显示「已打开」),回到两栏。聊天状态、草稿、流式不会因开关笔记而丢。
- Each AI bubble always shows three buttons:
Copy/Write to note/Delete. - Click Write to note (the red-bordered one) — the answer appends to this paper's child note, creating one if missing.
- Want to read & edit at once? Hit Open Note in the second toolbar row. The view becomes three columns: PDF Reader | Note panel (toolbar / H2 / timestamp / body) | AI Sidebar.
- The note panel uses Zotero's native rich-text editor — no Markdown source, edit directly. Enter / lists / selection all behave the Zotero way.
- Click Open Note again (now labelled "Opened") to collapse back to two columns. Toggling never drops chat state, drafts, or streaming.
π0.5: a VLA Model with Open-World Generalization
π0.5: a VLA Model with Open-World Generalization
摘要 — 我们提出 π0.5,一个面向开放世界泛化的视觉-语言-动作模型。
Abstract — we present π0.5, a Vision-Language-Action model targeting open-world generalization.
该方法在 12 个家庭场景中进行评估,涵盖新物体、新房间布局以及新指令组合。
The method is evaluated on 12 household scenes covering novel objects, layouts, and instruction compositions.
π0.5 阅读小结
π0.5 reading summary
核心结论。 π0.5 把开放世界泛化作为一等目标 — 不只是 in-distribution 刷分。
Core. π0.5 treats open-world generalization as first-class — not just in-distribution scoring.
方法亮点。 多家庭训练数据 + 推理时语言指令分解。
Method. Multi-home training + inference-time decomposition.
局限。 数据成本高,新家庭仍需更多 demo。
Limit. Data cost is steep; novel homes need more demos.
Enter / Shift+Enter 在句子之间穿行。
Toggle Translate in the Reader; click a sentence to translate it in-place. Enter / Shift+Enter walk through.
- 先去设置 → 翻译,挑 触发方式(单击 / 双击)、浮层位置(句子下方 / 上方)、浮层尺寸(紧凑 / 自适应)。
- 选 上下文范围 — 仅句子(最快)/ 本段(语境感)/ 整页(最贵但最准)。
- 设置 句子导航 快捷键 — 默认
Enter下一句、Shift+Enter上一句。 - 回到 PDF Reader 顶部点 译 开启模式(按钮变蓝高亮)。
- 点击任意句子 → 该句被高亮 → 译文浮层立刻出现在指定位置;按
Enter跳到下一句,继续阅读。 - Settings → Translation: pick trigger (single / double click), placement (below / above), and size (compact / adaptive).
- Pick context scope — sentence only (fastest) / paragraph (better context) / page (most accurate, costliest).
- Set navigation shortcuts — default
Enter= next sentence,Shift+Enter= previous. - Open a PDF, click Translate at the top of the Reader to enter the mode (button turns blue).
- Click any sentence — it highlights and a translation overlay pops up at the configured spot. Press
Enterto walk forward.
翻译 Translation
π0.5: a VLA Model with Open-World Generalization
π0.5: a VLA Model with Open-World Generalization
Most robotic learning systems are trained and evaluated within the same distribution. However, real-world deployment requires generalization to new objects, layouts, and instructions that were never seen during training.
In this work, we present π0.5, a vision-language-action model that explicitly targets open-world generalization through multi-home training data and inference-time language decomposition. We evaluate π0.5 on twelve novel household scenes covering unseen objects and layouts.