Suppose we have many copies of an unknown \(n\)-qubit state \(\rho\). We measure some copies of \(\rho\) using a known two-outcome measurement \(E_{1}\), then other copies using a measurement \(E_{2}\), and so on. At each stage \(t\), we generate a current hypothesis \(\omega_{t}\) about the state \(\rho\), using the outcomes of the previous measurements. We show that it is possible to do this in a way that guarantees that \(\left\vert \text{Tr} \!\left( E_{i} \omega_{t}\right) -\text{Tr} \!\left( E_{i}\rho\right) \right\vert \), the error in our prediction for the next measurement, is at least \(\varepsilon\) at most \(\operatorname{O}\!\left( n / \varepsilon^2 \right)\) times. Even in the non-realizable setting – where there could be arbitrary noise in the measurement outcomes – we show how to output hypothesis states that incur at most \(\operatorname{O}( \sqrt {Tn}\;)\) excess loss over the best possible state on the first \(T\) measurements. These results generalize a 2007 theorem by Aaronson on the PAC-learnability of quantum states, to the online and regret-minimization settings. We give three different ways to prove our results – using convex optimization, quantum postselection, and sequential fat-shattering dimension – which have different advantages in terms of parameters and portability.