What would you build if you could run Reinforcement Learning (RL) post-training on a 32B LLM in 4-bit NVFP4—on a single H100—with BF16-level accuracy and 1.2–1.5× step speedups? NVIDIA researchers (with collaborators from MIT, HKU, and Tsinghua) have open-sourced QeRL (Quantization-enhanced Reinforcement Learning), a training framework that pushes Reinforcement Learning (RL) post-training into 4-bit FP4 […]
The post QeRL: NVFP4-Quantized Reinforcement Learning (RL) Brings 32B LLM Training to a Single H100—While Improving Exploration appeared first on MarkTechPost.
In this tutorial, we explore how to build a Context-Folding LLM Agent that efficiently solves long, complex tasks by intelligently managing limited context. We design the agent to break down a large task into smaller subtasks, perform reasoning or calculations when needed, and then fold each completed sub-trajectory into concise summaries. By doing this, we […]
The post Building a Context-Folding LLM Agent for Long-Horizon Reasoning with Memory Compression and Tool Use appeared first on MarkTechPost.
Anthropic released Claude Haiku 4.5, a latency-optimized “small” model that delivers similar levels of coding performance to Claude Sonnet 4 while running more than twice as fast at one-third the cost. The model is immediately available via Anthropic’s API and in partner catalogs on Amazon Bedrock and Google Cloud Vertex AI. Pricing is $1/MTok input […]
The post Anthropic Launches Claude Haiku 4.5: Small AI Model that Delivers Sonnet-4-Level Coding Performance at One-Third the Cost and more than Twice the Speed appeared first on MarkTechPost.
Copyright 2024. All rights reserved