🎓How I Study AIHISA
📖Read
📄Papers📰Blogs🎬Courses
💡Learn
🛤️Paths📚Topics💡Concepts🎴Shorts
🎯Practice
📝Daily Log🎯Prompts🧠Review
SearchSettings
How I Study AI - Learn AI Papers & Lectures the Easy Way

Papers1262

AllBeginnerIntermediateAdvanced
All SourcesarXiv

LMEB: Long-horizon Memory Embedding Benchmark

Intermediate
Xinping Zhao, Xinshuo Hu et al.Mar 13arXiv

LMEB is a new test that checks whether text-embedding models can remember and find information across long stretches of time, not just in short, neat passages.

#LMEB#long-horizon memory retrieval#memory embeddings

Not triaged yet

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Intermediate
Boqiang Zhang, Lei Ke et al.Mar 6arXiv

Penguin-VL shows that small vision-language models (2B and 8B) can be very strong if you give them a better vision encoder, not just a bigger brain.

#Vision Language Model#LLM-based Vision Encoder#Contrastive Learning

Not triaged yet

Reasoning Models Struggle to Control their Chains of Thought

Beginner
Chen Yueh-Han, Robert McCarthy et al.Mar 5arXiv

The paper studies whether AI models can hide or reshape their step-by-step thoughts (chains of thought) on command.

#chain-of-thought#controllability#monitorability

Not triaged yet

RoboPocket: Improve Robot Policies Instantly with Your Phone

Intermediate
Junjie Fang, Wendi Chen et al.Mar 5arXiv

RoboPocket turns an ordinary smartphone into a pocket robot coach that helps you fix robot mistakes instantly—without touching a robot.

#RoboPocket#Imitation Learning#Interactive Imitation Learning

Not triaged yet

Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

Beginner
Guo Chen, Lidong Lu et al.Mar 5arXiv

This paper introduces MM-Lifelong, a 181-hour, multi-scale video dataset designed to test AI on true long-term (lifelong) understanding across days to months.

#multimodal lifelong understanding#long video reasoning#working memory bottleneck

Not triaged yet

RealWonder: Real-Time Physical Action-Conditioned Video Generation

Intermediate
Wei Liu, Ziyu Chen et al.Mar 5arXiv

RealWonder is a system that turns a single picture and 3D physical actions (like pushes, wind, and robot gripper moves) into a realistic video in real time.

#action-conditioned video generation#physics simulation#optical flow

Not triaged yet

On-Policy Self-Distillation for Reasoning Compression

Beginner
Hejian Sang, Yuanda Xu et al.Mar 5arXiv

Reasoning models often talk too much, and those extra words can actually make them more wrong.

#on-policy self-distillation#reasoning compression#conciseness instruction

Not triaged yet

Progressive Residual Warmup for Language Model Pretraining

Intermediate
Tianhao Chen, Xin Xu et al.Mar 5arXiv

Training big Transformers can wobble at the start because every layer tries to learn all at once.

#Progressive Residual Warmup#ProRes#Transformer training stability

Not triaged yet

UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

Intermediate
Sizhe Yang, Yiman Xie et al.Mar 5arXiv

Robots need many different ways to grab things, just like people use pinch, tripod, whole-hand, or two hands together.

#bimanual dexterous grasping#universal grasp policy#synthetic data generation

Not triaged yet

KARL: Knowledge Agents via Reinforcement Learning

Beginner
Jonathan D. Chang, Andrew Drozdov et al.Mar 5arXiv

KARL is a smart search helper that learns to look up information step by step and explain answers using the facts it finds.

#grounded reasoning#enterprise search#reinforcement learning

Not triaged yet

BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning

Intermediate
Yuan Li, Bo Wang et al.Mar 5arXiv

BandPO is a new training method for large language models that keeps updates safe while letting the model freely explore smart, low-probability ideas.

#BandPO#PPO clipping#trust region

Not triaged yet

Locality-Attending Vision Transformer

Intermediate
Sina Hajimiri, Farzad Beizaee et al.Mar 5arXiv

Vision Transformers (ViTs) are great at recognizing what is in a whole image but often blur the tiny details needed to label each pixel (segmentation).

#Vision Transformer#self-attention#segmentation

Not triaged yet

12345