# On the complexity of random satisfiability problems with planted solutions

## Abstract

The problem of identifying a planted assignment given a random k-SAT formula consistent with the assignment exhibits a large algorithmic gap: while the planted solution can always be identified given a formula with O(nlogn) clauses, there are distributions over clauses for which the best known efficient algorithms require nk/2 clauses. We propose and study a unified model for planted k-SAT, which captures well-known special cases. An instance is described by a planted assignment σ and a distribution on clauses with k literals. We define its distribution complexity as the largest r for which the distribution is not r-wise independent (1 ≤ r ≤ k for any distribution with a planted assignment). Our main result is an unconditional lower bound, tight up to logarithmic factors, of Ω(nr-2 ) clauses for statistical algorithms [42, 30], matching the known upper bound (which, as we show, can be implemented using a statistical algorithm) [31]. Since known approaches for problems over distributions have statistical analogues (spectral, MCMC, gradient-based, convex optimization etc.), this lower bound provides a rigorous explanation of the observed algorithmic gap. The proof introduces a new general technique for the analysis of statistical algorithms. It also points to a geometric paring phenomenon in the space of all planted assignments. We describe consequences of our lower bounds to Feige's refutation hypothesis [26] and to lower bounds on general convex programs that solve planted k-SAT. Our bounds also extend to the planted k-CSP model, defined by Goldreich as a candidate for one-way function [37], and therefore provide concrete evidence for the security of Goldreich's one-way function and the associated pseudorandom generator when used with a sufficiently hard predicate.