当日快照Daily Snapshot

2026年3月24日更新于 3月24日 22:25已收录 28 条情报

今天的 AI 热点集中在模型、工具链和智能体，社区讨论、开源项目和 Agent Skills 都有可跟进的新动作。

如果你只看首屏，先记住这件事：外部最强信号是「LiteLLM Python package comp...」，产品侧先看 dify，模型和工具链是今天最清晰的主线。

今天发生了什么大事

今天最突出的外部讨论来自Hacker News和r/LocalLLaMA，关注点主要落在模型和工具链。

最值得先看的代表信号是「LiteLLM Python package compromise...」，它最能代表社区今天在追什么。

代表信号LiteLLM Python package co...

值得关注的新产品

今天最值得点开的产品信号是 dify 这类偏智能体和编码方向的GitHub 项目。

如果你还想多看一个，n8n 可以补足智能体和编码视角。

先看项目dify

今天的趋势

今天的结构性趋势是模型、工具链和智能体继续占主导，说明热度正在往更可落地的 AI 能力堆栈收拢。

从来源看，GitHub Search、r/LocalLLaMA和GitHub Skills Radar 贡献了最多信号，趋势并不只停留在单一社区。

主趋势模型和工具链

当前可见情报

Visible Signals

28

历史快照数

Tracked Archives

1

健康数据源

Healthy Sources

9/14

3月24日 22:25

搜索项目、资讯与 Skills

今天共收录 28 条情报。搜索和筛选都在本地完成，扫读会更轻快。

transformers

中文导读这是一个偏模型和工具链方向的开源 AI 项目，重点覆盖多模态能力、机器学习和框架能力，并兼顾模型推理和模型训练。

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

15.8万星标3.3万分叉Python

模型 Model工具链 Tooling

dify

中文导读这是一个偏智能体和编码方向的开源 AI 项目，重点覆盖智能体工作流、生产环境可用和平台化能力。

Production-ready platform for agentic workflow development.

13.4万星标2.1万分叉TypeScript

智能体 Agent编码 Coding模型 Model

open-webui

中文导读这是一个偏智能体和模型方向的开源 AI 项目，重点覆盖交互界面和API 集成。

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

12.8万星标1.8万分叉Python

智能体 Agent模型 Model设计 Design

langchain

中文导读这是一个偏智能体和模型方向的开源 AI 项目，重点覆盖智能体工程和平台化能力。

The agent engineering platform

13.1万星标2.2万分叉Python

智能体 Agent模型 Model工具链 Tooling

vllm

中文导读这是一个偏模型和基础设施方向的开源 AI 项目，重点覆盖推理与服务部署、高吞吐和内存效率，并兼顾大模型。

A high-throughput and memory-efficient inference and serving engine for LLMs

7.4万星标1.5万分叉Python

模型 Model基础设施 Infra

ragflow

中文导读这是一个偏智能体和模型方向的开源 AI 项目，重点覆盖开源生态、能力封装和大模型。

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

7.6万星标8504 分叉Python

智能体 Agent模型 Model研究 Research

llama.cpp

中文导读这是一个偏模型方向的开源 AI 项目，重点覆盖模型推理和大模型。

LLM inference in C/C++

9.9万星标1.6万分叉C++

模型 Model

PaddleOCR

中文导读这是一个偏模型和工具链方向的开源 AI 项目，重点覆盖工具包和大模型。

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

7.3万星标1万分叉Python

模型 Model工具链 Tooling

LiteLLM Python package compromised by supply-chain attack

中文导读这条来自Hacker News的 AI 讨论，围绕“LiteLLM Python package compromised by supply-chain at...”展开，重点是大模型。

High-signal AI discussion trending on Hacker News.

203 点赞112 评论

模型 Model

iPhone 17 Pro Demonstrated Running a 400B LLM

中文导读这条来自Hacker News的 AI 讨论，围绕“iPhone 17 Pro Demonstrated Running a 400B LLM”展开，重点是大模型。

High-signal AI discussion trending on Hacker News.

662 点赞290 评论

模型 Model

Created a SillyTavern extension that brings NPC's to life in any game

中文导读这条来自r/LocalLLaMA的 AI 讨论，围绕“Created a SillyTavern extension that brings NPC's to ...”展开，重点是产品发布、插件扩展和大模型。

Using SillyTavern as the backend for all the RP means it can work with almost any game, with just a small mod acting as a bridge between them. Right now I’m using Cydonia as the RP model and Qwen 3.5 0.8B as the game master. Everything is running locally. The idea is that you can take any game, download its entire wiki, and feed it into SillyTavern. Then every character has their own full lore, relationships, opinions, etc., and can respond appropriately. On top of that, every voice is automatically cloned using the game’s files and mapped to each NPC. The NPCs can also be fed as much information per turn as you want about the game world - like their current location, player stats, player HP, etc. All RP happens inside SillyTavern, and the model is never even told it’s part of a game world. Paired with a locally run RP-tuned model like Cydonia, this gives great results with low latency, as well as strong narration of physical actions. A second pass is then run over each message using a small model (currently Qwen 3.5 0.8B) with structured output. This maps responses to actual in-game actions exposed by your mod. For example, in this video I approached an NPC and only sent “*shoots at you*”. The NPC then narrated themselves shooting back at me. Qwen 3.5 reads this conversation and decides that the correct action is for the NPC to shoot back at the player. Essentially, the tiny model acts as a game master, deciding which actions should map to which functions in-game. This means the RP can flow freely without being constrained to a strict structure, which leads to much better results. In older games, this could add a lot more life even without the conversational aspect. NPCs simply reacting to your actions adds a ton of depth. Not sure why this isn’t more popular. My guess is that most people don’t realise how good highly specialised, fine-tuned RP models can be compared to base models. I was honestly blown away when I started experimenting with them while building this.

119 点赞37 评论

模型 Model工具链 Tooling设计 Design

RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language'

中文导读这条来自r/LocalLLaMA的 AI 讨论，围绕“RYS II - Repeated layers with Qwen3.5 27B and some hi...”展开，重点是大模型。

So, I've had my H100s grind for you all, and have some interesting new results AND fresh models! So, what did I find? Well because my blog article are too damn long (*I know some of you are not reading the whole thing...*), here is a **TL;DR**: 1. I found that LLMs seem to *think in a universal language*. During the middle layers, the models latent representations are more similar on the same content in Chinese and English than different content in the same language. 2. I tried a bunch of different stuff, but in the end, repeating blocks in the middle of the transformer stack works the best. 3. You should still read the blog: [https://dnhkng.github.io/posts/rys-ii/](https://dnhkng.github.io/posts/rys-ii/) If you still didnt read the blog, well, I guess you can just try the models? [https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-S](https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-S) [https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-M](https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-M) [https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-L](https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-L) [https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-XL](https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-XL) Wen GGUF? *When someone GGUF's them I guess?* When you repeat layers, you benefit a lot from fine tuning. I expect the first team to fine tune RYS-Qwen3.5-27B-FP8-XL will have a new SOTA for that size range. Lastly, Ive been chatting with TurboDerp; hopefully we can get this into a new format where you can keep the duplicated later as copies, and not use more VRAM (except for the KV cache). S***tay tuned!***

467 点赞83 评论

模型 Model工具链 Tooling产品 Product

Litellm 1.82.7 and 1.82.8 on PyPI are compromised, do not update!

中文导读这条来自r/LocalLLaMA的 AI 讨论，围绕“Litellm 1.82.7 and 1.82.8 on PyPI are compromised, do...”展开，重点是大模型。

We just have been compromised, thousands of peoples likely are as well, more details updated here: [https://futuresearch.ai/blog/litellm-pypi-supply-chain-attack/](https://futuresearch.ai/blog/litellm-pypi-supply-chain-attack/)

103 点赞27 评论

模型 Model研究 Research

China's open-source dominance threatens US AI lead, US advisory body warns

中文导读这条来自r/LocalLLaMA的 AI 讨论，围绕“China's open-source dominance threatens US AI lead, U...”展开，重点是开源生态。

AI discussion from r/LocalLLaMA.

509 点赞203 评论

工具链 Tooling

FlashAttention-4: 1613 TFLOPs/s, 2.7x faster than Triton, written in Python. What it means for inference.

中文导读这条来自r/LocalLLaMA的 AI 讨论，围绕“FlashAttention-4: 1613 TFLOPs/s, 2.7x faster than Tri...”展开，重点是模型推理、研究进展和产品发布。

Wrote a deep dive on **FlashAttention-4 (03/05/2026)** that's relevant for anyone thinking about inference performance. **TL;DR for inference:** * **BF16 forward: 1,613 TFLOPs/s on B200 (71% utilization). Attention is basically at matmul speed now.** * **2.1-2.7x faster than Triton, up to 1.3x faster than cuDNN 9.13** * **vLLM 0.17.0 (released March 7) integrates FA-4. If you're on B200, it's automatic.** * **PyTorch FlexAttention also has an FA-4 backend (1.2-3.2x over Triton backend)** * **GQA and MQA fully supported (Llama, Mistral, Qwen, Gemma all work)** * **Sliding window available via window\_size parameter** **Bad news for most of us:** FA-4 is Hopper + Blackwell only. Works on H100/H800 and B200/B100. Not on A100 or consumer cards. The optimizations exploit specific Blackwell hardware features (TMEM, 2-CTA MMA, async TMA) that don't exist on older GPUs. **If you're on A100**: stay on FA-2. I**f you're on H100**: FA-4 is supported but gains are smaller than on Blackwell. Worth testing. **If you're on B200**: just update vLLM and you're good. *The article breaks down why softmax (not matmul) is now the bottleneck on Blackwell, how selective rescaling skips \~10x of the softmax correction work, and the full 5-stage pipeline architecture.* *Also covers the Python angle: FA-4 is 100% CuTe-DSL (NVIDIA's Python kernel DSL). Compiles in 2.5 seconds vs 55 seconds for the C++ equivalent. Same runtime perf. That's a big deal for kernel iteration speed.* **Paper**: [https://arxiv.org/abs/2603.05451](https://arxiv.org/abs/2603.05451) **Article free link**: [https://medium.com/ai-advances/flashattention-4-python-gpu-kernel-blackwell-2b18f51c8b32?sk=59bca93c369143e5f74fb0f86e57e6d0](https://medium.com/ai-advances/flashattention-4-python-gpu-kernel-blackwell-2b18f51c8b32?sk=59bca93c369143e5f74fb0f86e57e6d0) **For those running local models:** The algorithmic ideas (selective rescaling, software-emulated exp) will likely trickle down to consumer GPUs eventually. The CuTeDSL tooling is the real unlock for faster kernel development across the board.

196 点赞58 评论

模型 Model工具链 Tooling基础设施 Infra

Are we currently in a "Golden Time" for low VRAM/1 GPU users with Qwen 27b?

中文导读这条来自r/LocalLLaMA的 AI 讨论，围绕“Are we currently in a "Golden Time" for low VRAM/1 GP...”展开，重点是大模型。

Really loving Qwen 27b more than any other llm from when I can remember. It works so well. Having 48gb vram can anyone recommend any other alternatives? It seems that 24gb is enough and currently I can't think of any other open model to use.

108 点赞86 评论

模型 Model

Which local model we running on the overland Jeep fellas?

中文导读这条来自r/LocalLLaMA的 AI 讨论，围绕“Which local model we running on the overland Jeep fel...”展开，重点是大模型。

AI discussion from r/LocalLLaMA.

252 点赞101 评论

模型 Model

NanoClaw Adopts OneCLI Agent Vault

中文导读这条来自Hacker News的 AI 讨论，围绕“NanoClaw Adopts OneCLI Agent Vault”展开，讨论热度已经被榜单放大。

High-signal AI discussion trending on Hacker News.

50 点赞6 评论

智能体 Agent编码 Coding

Another appreciation post for qwen3.5 27b model

中文导读这条来自r/LocalLLaMA的 AI 讨论，围绕“Another appreciation post for qwen3.5 27b model”展开，重点是API 集成和大模型。

I tested qwen3.5 122b when it went out, I really liked it and for my development tests it was on pair to gemini 3 flash (my current AI tool for coding) so I was looking for hardware investing, the problem is I need a new mobo and 1 (or 2 more 3090) and the price is just too high right now. I saw a lot of posts saying that qwen3.5 27b was better than 122b it actually didn't made sense to me, then I saw nemotron 3 super 120b but people said it was not better than qwen3.5 122b, I trusted them. Yesterday and today I tested all these models: >"unsloth/Qwen3.5-27B-GGUF:UD-Q4\_K\_XL" "unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4\_K\_XL" "unsloth/Qwen3.5-122B-A10B-GGUF" "unsloth/Qwen3.5-27B-GGUF:UD-Q6\_K\_XL" "unsloth/Qwen3.5-27B-GGUF:UD-Q8\_K\_XL" "unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF:UD-IQ4\_XS" "unsloth/gpt-oss-120b-GGUF:F16" I also tested against gpt-5.4 high so I can compare them better. To my sorprise nemotron was very, very good model, on par with gpt-5.4 and also qwen3.5-25b did great as well. Sadly (but also good) gpt-oss 120b and qwen3.5 122b performed worse than the other 2 models (good because they need more hardware). So I can finally use "Qwen3.5-27B-GGUF:UD-Q6\_K\_XL" for real developing tasks locally, the best is I don't need to get more hardware (I already own 2x 3090). I am sorry for not providing too much info but I didn't save the tg/pp for all of them, nemotron ran at 80 tg and about 2000 pp, 100k context on [vast.ai](http://vast.ai) with 4 rtx 3090 and Qwen3.5-27B Q6 at 803pp, 25 tg, 256k context on [vast.ai](http://vast.ai) as well. I'll setup it locally probably next week for production use. These are the commands I used (pretty much copied from unsloth page): ./llama.cpp/llama-server -hf unsloth/Qwen3.5-27B-GGUF:UD-Q6_K_XL --ctx-size 262144 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 -ngl 999 P.D. I am so glad I can actually replace API subscriptions (at least for the daily tasks), I'll continue using CODEX for complex tasks. If I had the hardware that nemotron-3-super 120b requires, I would use it instead, it also responded always on my own language (Spanish) while others responded on English.

126 点赞77 评论

编码 Coding模型 Model工具链 Tooling

Mark Zuckerberg builds AI CEO to help him run Meta

中文导读这条来自r/artificial的 AI 讨论，围绕“Mark Zuckerberg builds AI CEO to help him run Meta”展开，讨论热度已经被榜单放大。

AI discussion from r/artificial.

90 点赞64 评论

设计 Design

n8n

中文导读这是一个适合智能体和编码方向的可复用 Agent Skill，主要用于平台化能力、自动化和能力封装。

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

18.1万星标5.6万分叉TypeScript

智能体 Agent编码 Coding工具链 Tooling

AutoGPT

中文导读这是一个适合智能体和工具链方向的可复用 Agent Skill，主要用于视觉能力。

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

18.3万星标4.6万分叉Python

智能体 Agent工具链 Tooling设计 Design

openclaw

中文导读这是一个适合智能体和工具链方向的可复用 Agent Skill，主要用于平台化能力和助手能力。

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

33.4万星标6.5万分叉TypeScript

智能体 Agent工具链 Tooling

developer-roadmap

中文导读这是一个适合编码和基础设施方向的可复用 Agent Skill，适合补强你自己的智能体工作流。

Interactive roadmaps, guides and other educational content to help developers grow in their careers.

35.2万星标4.4万分叉TypeScript

编码 Coding基础设施 Infra产品 Product

tensorflow

中文导读这是一个适合工具链方向的可复用 Agent Skill，主要用于机器学习、开源生态和框架能力。

An Open Source Machine Learning Framework for Everyone

19.4万星标7.5万分叉C++

工具链 Tooling

flutter

中文导读这是一个适合工具链和设计方向的可复用 Agent Skill，适合补强你自己的智能体工作流。

Flutter makes it easy and fast to build beautiful apps for mobile and beyond

17.6万星标3万分叉Dart

工具链 Tooling设计 Design

awesome-python

中文导读这是一个适合工具链方向的可复用 Agent Skill，适合补强你自己的智能体工作流。

An opinionated list of Python frameworks, libraries, tools, and resources.

28.9万星标2.7万分叉Python

工具链 Tooling

awesome-selfhosted

中文导读这是一个适合工具链方向的可复用 Agent Skill，适合补强你自己的智能体工作流。

A list of Free Software network services and web applications which can be hosted on your own servers

28.2万星标1.3万分叉

工具链 Tooling

今天的 AI 热点集中在模型、工具链和智能体，社区讨论、开源项目和 Agent Skills 都有可跟进的新动作。

28

1

9/14

3月24日 22:25

GitHub AI 热门项目

transformers

dify

open-webui

langchain

vllm

ragflow

llama.cpp

PaddleOCR

AI 社交与媒体热榜

LiteLLM Python package compromised by supply-chain attack

iPhone 17 Pro Demonstrated Running a 400B LLM

Created a SillyTavern extension that brings NPC's to life in any game

RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language'

Litellm 1.82.7 and 1.82.8 on PyPI are compromised, do not update!

China's open-source dominance threatens US AI lead, US advisory body warns

FlashAttention-4: 1613 TFLOPs/s, 2.7x faster than Triton, written in Python. What it means for inference.

Are we currently in a "Golden Time" for low VRAM/1 GPU users with Qwen 27b?

Which local model we running on the overland Jeep fellas?

NanoClaw Adopts OneCLI Agent Vault

Another appreciation post for qwen3.5 27b model

Mark Zuckerberg builds AI CEO to help him run Meta

热门 Agent Skills

n8n

AutoGPT

openclaw

developer-roadmap

tensorflow

flutter

awesome-python

awesome-selfhosted