Papers
arxiv:2601.03153

Parallel Latent Reasoning for Sequential Recommendation

Published on Jan 6
· Submitted by
TangJiakai
on Jan 7
Authors:
,
,
,
,

Abstract

Parallel Latent Reasoning framework improves sequential recommendation by exploring multiple diverse reasoning trajectories simultaneously through learnable trigger tokens and adaptive aggregation.

AI-generated summary

Capturing complex user preferences from sparse behavioral sequences remains a fundamental challenge in sequential recommendation. Recent latent reasoning methods have shown promise by extending test-time computation through multi-step reasoning, yet they exclusively rely on depth-level scaling along a single trajectory, suffering from diminishing returns as reasoning depth increases. To address this limitation, we propose Parallel Latent Reasoning (PLR), a novel framework that pioneers width-level computational scaling by exploring multiple diverse reasoning trajectories simultaneously. PLR constructs parallel reasoning streams through learnable trigger tokens in continuous latent space, preserves diversity across streams via global reasoning regularization, and adaptively synthesizes multi-stream outputs through mixture-of-reasoning-streams aggregation. Extensive experiments on three real-world datasets demonstrate that PLR substantially outperforms state-of-the-art baselines while maintaining real-time inference efficiency. Theoretical analysis further validates the effectiveness of parallel reasoning in improving generalization capability. Our work opens new avenues for enhancing reasoning capacity in sequential recommendation beyond existing depth scaling.

Community

Paper author Paper submitter

Parallel Latent Reasoning (PLR): Sequential Recommendation with Parallel Reasoning 🔥
📉 Depth-only reasoning often hits performance plateaus—PLR mitigates this with parallel latent reasoning.

Core Innovation ✨
🎯 Learnable trigger tokens: Build parallel streams in continuous latent space.
🔄 Global regularization: Preserve stream diversity to avoid redundancy.
⚖️ Adaptive aggregation: Smartly combine multi-stream insights for optimal results.

Key Advantages 🚀
📊 Outperforms SOTA baselines (SASRec, BERT4Rec, ReaRec, LRESA) by 5.5%–14.9% on Recall@10/20 and NDCG@10/20 across three real-world datasets.
⚡ Real-time efficiency: Only 5.8% latency increase vs. base models, enabled by KV Caching and GPU parallelism.
🛡️ Strong robustness: Maintains top performance even with 30% missing user interactions.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.03153 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.03153 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.03153 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.