今天发生了什么大事
今天最突出的外部讨论来自Hacker News和r/LocalLLaMA,关注点主要落在模型和工具链。
最值得先看的代表信号是「LiteLLM Python package compromise...」,它最能代表社区今天在追什么。
如果你只看首屏,先记住这件事:外部最强信号是「LiteLLM Python package comp...」,产品侧先看 dify,模型和工具链是今天最清晰的主线。
今天发生了什么大事
今天最突出的外部讨论来自Hacker News和r/LocalLLaMA,关注点主要落在模型和工具链。
最值得先看的代表信号是「LiteLLM Python package compromise...」,它最能代表社区今天在追什么。
值得关注的新产品
今天最值得点开的产品信号是 dify 这类偏智能体和编码方向的GitHub 项目。
如果你还想多看一个,n8n 可以补足智能体和编码视角。
今天的趋势
今天的结构性趋势是模型、工具链和智能体继续占主导,说明热度正在往更可落地的 AI 能力堆栈收拢。
从来源看,GitHub Search、r/LocalLLaMA和GitHub Skills Radar 贡献了最多信号,趋势并不只停留在单一社区。
当前可见情报
Visible Signals
历史快照数
Tracked Archives
健康数据源
Healthy Sources
最近更新
Last Update
今天共收录 28 条情报。搜索和筛选都在本地完成,扫读会更轻快。
聚合最近 24 小时热度上升最快的开源 AI 项目。
中文导读这是一个偏模型和工具链方向的开源 AI 项目,重点覆盖多模态能力、机器学习和框架能力,并兼顾模型推理和模型训练。
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
中文导读这是一个偏智能体和编码方向的开源 AI 项目,重点覆盖智能体工作流、生产环境可用和平台化能力。
Production-ready platform for agentic workflow development.
中文导读这是一个偏智能体和模型方向的开源 AI 项目,重点覆盖交互界面和API 集成。
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
中文导读这是一个偏智能体和模型方向的开源 AI 项目,重点覆盖智能体工程和平台化能力。
The agent engineering platform
中文导读这是一个偏模型和基础设施方向的开源 AI 项目,重点覆盖推理与服务部署、高吞吐和内存效率,并兼顾大模型。
A high-throughput and memory-efficient inference and serving engine for LLMs
中文导读这是一个偏智能体和模型方向的开源 AI 项目,重点覆盖开源生态、能力封装和大模型。
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
中文导读这是一个偏模型方向的开源 AI 项目,重点覆盖模型推理和大模型。
LLM inference in C/C++
中文导读这是一个偏模型和工具链方向的开源 AI 项目,重点覆盖工具包和大模型。
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
汇总来自稳定公开来源的高信号 AI 讨论与新闻。
中文导读这条来自Hacker News的 AI 讨论,围绕“LiteLLM Python package compromised by supply-chain at...”展开,重点是大模型。
High-signal AI discussion trending on Hacker News.
中文导读这条来自Hacker News的 AI 讨论,围绕“iPhone 17 Pro Demonstrated Running a 400B LLM”展开,重点是大模型。
High-signal AI discussion trending on Hacker News.
中文导读这条来自r/LocalLLaMA的 AI 讨论,围绕“Created a SillyTavern extension that brings NPC's to ...”展开,重点是产品发布、插件扩展和大模型。
Using SillyTavern as the backend for all the RP means it can work with almost any game, with just a small mod acting as a bridge between them. Right now I’m using Cydonia as the RP model and Qwen 3.5 0.8B as the game master. Everything is running locally. The idea is that you can take any game, download its entire wiki, and feed it into SillyTavern. Then every character has their own full lore, relationships, opinions, etc., and can respond appropriately. On top of that, every voice is automatically cloned using the game’s files and mapped to each NPC. The NPCs can also be fed as much information per turn as you want about the game world - like their current location, player stats, player HP, etc. All RP happens inside SillyTavern, and the model is never even told it’s part of a game world. Paired with a locally run RP-tuned model like Cydonia, this gives great results with low latency, as well as strong narration of physical actions. A second pass is then run over each message using a small model (currently Qwen 3.5 0.8B) with structured output. This maps responses to actual in-game actions exposed by your mod. For example, in this video I approached an NPC and only sent “*shoots at you*”. The NPC then narrated themselves shooting back at me. Qwen 3.5 reads this conversation and decides that the correct action is for the NPC to shoot back at the player. Essentially, the tiny model acts as a game master, deciding which actions should map to which functions in-game. This means the RP can flow freely without being constrained to a strict structure, which leads to much better results. In older games, this could add a lot more life even without the conversational aspect. NPCs simply reacting to your actions adds a ton of depth. Not sure why this isn’t more popular. My guess is that most people don’t realise how good highly specialised, fine-tuned RP models can be compared to base models. I was honestly blown away when I started experimenting with them while building this.
中文导读这条来自r/LocalLLaMA的 AI 讨论,围绕“RYS II - Repeated layers with Qwen3.5 27B and some hi...”展开,重点是大模型。
So, I've had my H100s grind for you all, and have some interesting new results AND fresh models! So, what did I find? Well because my blog article are too damn long (*I know some of you are not reading the whole thing...*), here is a **TL;DR**: 1. I found that LLMs seem to *think in a universal language*. During the middle layers, the models latent representations are more similar on the same content in Chinese and English than different content in the same language. 2. I tried a bunch of different stuff, but in the end, repeating blocks in the middle of the transformer stack works the best. 3. You should still read the blog: [https://dnhkng.github.io/posts/rys-ii/](https://dnhkng.github.io/posts/rys-ii/) If you still didnt read the blog, well, I guess you can just try the models? [https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-S](https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-S) [https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-M](https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-M) [https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-L](https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-L) [https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-XL](https://huggingface.co/dnhkng/RYS-Qwen3.5-27B-FP8-XL) Wen GGUF? *When someone GGUF's them I guess?* When you repeat layers, you benefit a lot from fine tuning. I expect the first team to fine tune RYS-Qwen3.5-27B-FP8-XL will have a new SOTA for that size range. Lastly, Ive been chatting with TurboDerp; hopefully we can get this into a new format where you can keep the duplicated later as copies, and not use more VRAM (except for the KV cache). S***tay tuned!***
中文导读这条来自r/LocalLLaMA的 AI 讨论,围绕“Litellm 1.82.7 and 1.82.8 on PyPI are compromised, do...”展开,重点是大模型。
We just have been compromised, thousands of peoples likely are as well, more details updated here: [https://futuresearch.ai/blog/litellm-pypi-supply-chain-attack/](https://futuresearch.ai/blog/litellm-pypi-supply-chain-attack/)
中文导读这条来自r/LocalLLaMA的 AI 讨论,围绕“China's open-source dominance threatens US AI lead, U...”展开,重点是开源生态。
AI discussion from r/LocalLLaMA.
中文导读这条来自r/LocalLLaMA的 AI 讨论,围绕“FlashAttention-4: 1613 TFLOPs/s, 2.7x faster than Tri...”展开,重点是模型推理、研究进展和产品发布。
Wrote a deep dive on **FlashAttention-4 (03/05/2026)** that's relevant for anyone thinking about inference performance. **TL;DR for inference:** * **BF16 forward: 1,613 TFLOPs/s on B200 (71% utilization). Attention is basically at matmul speed now.** * **2.1-2.7x faster than Triton, up to 1.3x faster than cuDNN 9.13** * **vLLM 0.17.0 (released March 7) integrates FA-4. If you're on B200, it's automatic.** * **PyTorch FlexAttention also has an FA-4 backend (1.2-3.2x over Triton backend)** * **GQA and MQA fully supported (Llama, Mistral, Qwen, Gemma all work)** * **Sliding window available via window\_size parameter** **Bad news for most of us:** FA-4 is Hopper + Blackwell only. Works on H100/H800 and B200/B100. Not on A100 or consumer cards. The optimizations exploit specific Blackwell hardware features (TMEM, 2-CTA MMA, async TMA) that don't exist on older GPUs. **If you're on A100**: stay on FA-2. I**f you're on H100**: FA-4 is supported but gains are smaller than on Blackwell. Worth testing. **If you're on B200**: just update vLLM and you're good. *The article breaks down why softmax (not matmul) is now the bottleneck on Blackwell, how selective rescaling skips \~10x of the softmax correction work, and the full 5-stage pipeline architecture.* *Also covers the Python angle: FA-4 is 100% CuTe-DSL (NVIDIA's Python kernel DSL). Compiles in 2.5 seconds vs 55 seconds for the C++ equivalent. Same runtime perf. That's a big deal for kernel iteration speed.* **Paper**: [https://arxiv.org/abs/2603.05451](https://arxiv.org/abs/2603.05451) **Article free link**: [https://medium.com/ai-advances/flashattention-4-python-gpu-kernel-blackwell-2b18f51c8b32?sk=59bca93c369143e5f74fb0f86e57e6d0](https://medium.com/ai-advances/flashattention-4-python-gpu-kernel-blackwell-2b18f51c8b32?sk=59bca93c369143e5f74fb0f86e57e6d0) **For those running local models:** The algorithmic ideas (selective rescaling, software-emulated exp) will likely trickle down to consumer GPUs eventually. The CuTeDSL tooling is the real unlock for faster kernel development across the board.
中文导读这条来自r/LocalLLaMA的 AI 讨论,围绕“Are we currently in a "Golden Time" for low VRAM/1 GP...”展开,重点是大模型。
Really loving Qwen 27b more than any other llm from when I can remember. It works so well. Having 48gb vram can anyone recommend any other alternatives? It seems that 24gb is enough and currently I can't think of any other open model to use.
中文导读这条来自r/LocalLLaMA的 AI 讨论,围绕“Which local model we running on the overland Jeep fel...”展开,重点是大模型。
AI discussion from r/LocalLLaMA.
中文导读这条来自Hacker News的 AI 讨论,围绕“NanoClaw Adopts OneCLI Agent Vault”展开,讨论热度已经被榜单放大。
High-signal AI discussion trending on Hacker News.
中文导读这条来自r/LocalLLaMA的 AI 讨论,围绕“Another appreciation post for qwen3.5 27b model”展开,重点是API 集成和大模型。
I tested qwen3.5 122b when it went out, I really liked it and for my development tests it was on pair to gemini 3 flash (my current AI tool for coding) so I was looking for hardware investing, the problem is I need a new mobo and 1 (or 2 more 3090) and the price is just too high right now. I saw a lot of posts saying that qwen3.5 27b was better than 122b it actually didn't made sense to me, then I saw nemotron 3 super 120b but people said it was not better than qwen3.5 122b, I trusted them. Yesterday and today I tested all these models: >"unsloth/Qwen3.5-27B-GGUF:UD-Q4\_K\_XL" "unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4\_K\_XL" "unsloth/Qwen3.5-122B-A10B-GGUF" "unsloth/Qwen3.5-27B-GGUF:UD-Q6\_K\_XL" "unsloth/Qwen3.5-27B-GGUF:UD-Q8\_K\_XL" "unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF:UD-IQ4\_XS" "unsloth/gpt-oss-120b-GGUF:F16" I also tested against gpt-5.4 high so I can compare them better. To my sorprise nemotron was very, very good model, on par with gpt-5.4 and also qwen3.5-25b did great as well. Sadly (but also good) gpt-oss 120b and qwen3.5 122b performed worse than the other 2 models (good because they need more hardware). So I can finally use "Qwen3.5-27B-GGUF:UD-Q6\_K\_XL" for real developing tasks locally, the best is I don't need to get more hardware (I already own 2x 3090). I am sorry for not providing too much info but I didn't save the tg/pp for all of them, nemotron ran at 80 tg and about 2000 pp, 100k context on [vast.ai](http://vast.ai) with 4 rtx 3090 and Qwen3.5-27B Q6 at 803pp, 25 tg, 256k context on [vast.ai](http://vast.ai) as well. I'll setup it locally probably next week for production use. These are the commands I used (pretty much copied from unsloth page): ./llama.cpp/llama-server -hf unsloth/Qwen3.5-27B-GGUF:UD-Q6_K_XL --ctx-size 262144 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 -ngl 999 P.D. I am so glad I can actually replace API subscriptions (at least for the daily tasks), I'll continue using CODEX for complex tasks. If I had the hardware that nemotron-3-super 120b requires, I would use it instead, it also responded always on my own language (Spanish) while others responded on English.
中文导读这条来自r/artificial的 AI 讨论,围绕“Mark Zuckerberg builds AI CEO to help him run Meta”展开,讨论热度已经被榜单放大。
AI discussion from r/artificial.
聚焦适合 AI Agent 构建者复用的能力包、MCP 能力与工作流技能。
中文导读这是一个适合智能体和编码方向的可复用 Agent Skill,主要用于平台化能力、自动化和能力封装。
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
中文导读这是一个适合智能体和工具链方向的可复用 Agent Skill,主要用于视觉能力。
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
中文导读这是一个适合智能体和工具链方向的可复用 Agent Skill,主要用于平台化能力和助手能力。
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
中文导读这是一个适合编码和基础设施方向的可复用 Agent Skill,适合补强你自己的智能体工作流。
Interactive roadmaps, guides and other educational content to help developers grow in their careers.
中文导读这是一个适合工具链方向的可复用 Agent Skill,主要用于机器学习、开源生态和框架能力。
An Open Source Machine Learning Framework for Everyone
中文导读这是一个适合工具链和设计方向的可复用 Agent Skill,适合补强你自己的智能体工作流。
Flutter makes it easy and fast to build beautiful apps for mobile and beyond
中文导读这是一个适合工具链方向的可复用 Agent Skill,适合补强你自己的智能体工作流。
An opinionated list of Python frameworks, libraries, tools, and resources.
中文导读这是一个适合工具链方向的可复用 Agent Skill,适合补强你自己的智能体工作流。
A list of Free Software network services and web applications which can be hosted on your own servers