We present Temporal Difference Learning for Model Predictive Control with Policy Constraint (TD-M(PC)\(^2\)), a simple yet effective approach built on TD-MPC2 that allows a planning-based MBRL algorithm to better exploit complete off-policy data. Without introducing additional computational budget or need for environment-specific hyperparameter tuning, it seamlessly inherits desirable features of the \(\textit{state-of-the-art}\) pipeline and consistently improves its performance for continuous control problems. In complex 61-DoF locomotion tasks in
HumanoidBench, TD-M(PC)\(^2\) achieves over 100% improvement in final average performance over the baseline.