Parallel Latent Reasoning for Sequential Recommendation
Abstract
Parallel Latent Reasoning framework improves sequential recommendation by exploring multiple diverse reasoning trajectories simultaneously through learnable trigger tokens and adaptive aggregation.
Capturing complex user preferences from sparse behavioral sequences remains a fundamental challenge in sequential recommendation. Recent latent reasoning methods have shown promise by extending test-time computation through multi-step reasoning, yet they exclusively rely on depth-level scaling along a single trajectory, suffering from diminishing returns as reasoning depth increases. To address this limitation, we propose Parallel Latent Reasoning (PLR), a novel framework that pioneers width-level computational scaling by exploring multiple diverse reasoning trajectories simultaneously. PLR constructs parallel reasoning streams through learnable trigger tokens in continuous latent space, preserves diversity across streams via global reasoning regularization, and adaptively synthesizes multi-stream outputs through mixture-of-reasoning-streams aggregation. Extensive experiments on three real-world datasets demonstrate that PLR substantially outperforms state-of-the-art baselines while maintaining real-time inference efficiency. Theoretical analysis further validates the effectiveness of parallel reasoning in improving generalization capability. Our work opens new avenues for enhancing reasoning capacity in sequential recommendation beyond existing depth scaling.
Community
Parallel Latent Reasoning (PLR): Sequential Recommendation with Parallel Reasoning 🔥
📉 Depth-only reasoning often hits performance plateaus—PLR mitigates this with parallel latent reasoning.
Core Innovation ✨
🎯 Learnable trigger tokens: Build parallel streams in continuous latent space.
🔄 Global regularization: Preserve stream diversity to avoid redundancy.
⚖️ Adaptive aggregation: Smartly combine multi-stream insights for optimal results.
Key Advantages 🚀
📊 Outperforms SOTA baselines (SASRec, BERT4Rec, ReaRec, LRESA) by 5.5%–14.9% on Recall@10/20 and NDCG@10/20 across three real-world datasets.
⚡ Real-time efficiency: Only 5.8% latency increase vs. base models, enabled by KV Caching and GPU parallelism.
🛡️ Strong robustness: Maintains top performance even with 30% missing user interactions.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Intent-Guided Reasoning for Sequential Recommendation (2025)
- DTRec: Learning Dynamic Reasoning Trajectories for Sequential Recommendation (2025)
- HyMoERec: Hybrid Mixture-of-Experts for Sequential Recommendation (2025)
- SpiralThinker: Latent Reasoning through an Iterative Process with Text-Latent Interleaving (2025)
- SCoTER: Structured Chain-of-Thought Transfer for Enhanced Recommendation (2025)
- ReaSeq: Unleashing World Knowledge via Reasoning for Sequential Modeling (2025)
- Improving Latent Reasoning in LLMs via Soft Concept Mixing (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper