Arxiv Papers

AGENTSNET is a new benchmark for evaluating multi-agent systems' collaborative problem-solving, self-organization, and communication, revealing performance limitations as network size increases among large-language models. https://arxiv.org/abs//2507.08616 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Duration:00:07:37

AGENTSNET: Coordination and Collaborative Reasoning in Multi-Agent LLMs

Duration:00:19:47

[QA] One Token to Fool LLM-as-a-Judge

Generative reward models using LLMs for evaluating answer quality are vulnerable to superficial manipulations, prompting the need for improved evaluation methods and a robust new model to enhance reliability. https://arxiv.org/abs//2507.08794 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Duration:00:07:56

One Token to Fool LLM-as-a-Judge

Duration:00:17:55

[QA] Should We Still Pretrain Encoders with Masked Language Modeling?

This paper compares Masked Language Modeling and Causal Language Modeling for text representation, finding MLM generally performs better, but CLM offers data efficiency and stability, suggesting a biphasic training strategy. https://arxiv.org/abs//2507.00994 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Duration:00:08:09

Should We Still Pretrain Encoders with Masked Language Modeling?

Duration:00:16:52

[QA] Token Bottleneck: One Token to Remember Dynamics

The paper presents Token Bottleneck (ToBo), a self-supervised learning method for compact visual representations, enhancing sequential scene understanding and demonstrating effectiveness in various tasks and real-world applications. https://arxiv.org/abs//2507.06543 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Duration:00:07:30

Token Bottleneck: One Token to Remember Dynamics

Duration:00:16:06

[QA] A Systematic Analysis of Hybrid Linear Attention

https://arxiv.org/abs//2507.06457 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Duration:00:07:55

A Systematic Analysis of Hybrid Linear Attention

Duration:00:15:40

[QA] First Return, Entropy-Eliciting Explore

https://arxiv.org/abs//2507.07017 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Duration:00:07:43

First Return, Entropy-Eliciting Explore

Duration:00:21:32

[QA] Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs

Pretrained neural networks can adapt their architecture dynamically for different inputs, improving efficiency and performance by customizing layer usage without finetuning, as shown through Monte Carlo Tree Search optimization. https://arxiv.org/abs//2507.07996 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Duration:00:08:31

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs

Duration:00:15:32

[QA] Scaling RL to Long Videos

https://arxiv.org/abs//2507.07966 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Duration:00:08:19

Scaling RL to Long Videos

Duration:00:15:24

[QA] Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving

The paper proposes a decoupled framework for Automated Theorem Proving, enhancing reasoning and proving performance by using specialized models, achieving success on challenging mathematical problems. https://arxiv.org/abs//2507.06804 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Duration:00:08:09

Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving

Duration:00:21:33

[QA] Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful

This paper challenges conventional wisdom on small batch sizes in language model training, demonstrating their stability, robustness, and efficiency, while providing guidelines for hyperparameter adjustments and batch size selection. https://arxiv.org/abs//2507.07101 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Duration:00:07:03

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful