Uncategorized

Solving some formal math olympiad complications


We constructed a neural theorem prover for Lean that realized to solve a diversity of tough excessive-college olympiad complications, including complications from the AMC12 and AIME competitions, besides to 2 complications adapted from the IMO. The prover makes exhaust of a language model to search out proofs of formal statements. Each time we uncover a brand new proof, we exhaust it as new practicing records, which improves the neural network and enables it to iteratively in discovering suggestions to more challenging and more challenging statements.

Read Paper

We finished a brand new explain-of-the-art (41.2% vs 29.3%) on the miniF2F benchmark, a tough sequence of excessive-college olympiad complications. Our skill, which we call assertion curriculum studying, includes manually gathering a residence of statements of more than just a few distress stages (without proof) the save the toughest statements are an equivalent to the benchmark we intention. First and predominant our neural prover is outmoded and can most efficient level to just a few of them. We iteratively glimpse for new proofs and re-prepare our neural network on the newly chanced on proofs, and after 8 iterations, our prover finally ends up being vastly superior when examined on miniF2F.

Formal mathematics is a thrilling domain to jog searching thanks to (i) its richness, letting you level to arbitrary theorems which require reasoning, creativity and perception and (ii) its similarity to video games—the save AI has been spectacularly winning—in that it has an automatic skill of determining whether a proof is winning (i.e., verified by the formal system). As demonstrated in the trivial instance below, proving a proper assertion requires generating a sequence of proof steps, every proof step consisting in a call to a tactic. These tactics lift mathematical terms as arguments and each tactic call will turn into the hot assertion to level to, into statements which will seemingly be simpler to level to, unless nothing is left to level to.

Thunder 1

Tailored from AMC12 2000 Thunder 5

Ticket that if $|x – 2| = p$, the save $x < 2$, then $x - p = 2 - 2p$.

theorem amc12_2000_p5      
  (x p : ℝ)                
  (h₀ : x --   to level to
  (h₁ : abs (x - 2) = p) :
  x - p = 2 - 2 p :=
delivery                      
  
  
  have h₂ : abs (x - 2) = -(x - 2), {
    apply abs_of_neg,
    linarith,
  },
  rw h₁ at h₂,
  
  
  linarith,
pause

We look that the skill to generate usual mathematical terms required as arguments of tactics, which can now no longer be kept remote from a neural language model, emerges from our practicing design. The proof below is an instance of it: the proof step exhaust n + 1 (entirely generated by our devices) proposes to make exhaust of n + 1 as a resolution, the relaxation of the formal proof counting on the ring_exp tactic to confirm that it is indeed true.

Thunder 2

Tailored from AMC12B 2020 Thunder 6

For all integers $n ≥ 9$, level to that $((n + 2)! −(n + 1)!) / n!$ is a splendid square.

theorem amc12b_2020_p6
  (n : ℕ)
  (h0 : 9 ≤ n) :
  ∃ x : ℕ, (x:ℝ)^2 = 
    (nat.factorial (n + 2) - nat.factorial (n + 1))
    / nat.factorial n :=
delivery
  
  exhaust n + 1,
  field_simp [nat.factorial_ne_zero, pow_succ'],
  ring_exp
pause

We also look that our devices and search design are in a position to manufacturing proofs that chain a lot of non-trivial reasoning steps. In the proof below, the model starts by the exhaust of contraposition main to the existential assertion (∃ (x : ℝ), f x ≠ a x + b). It then generates a look for it with exhaust (0 : ℝ) and finishes the proof by leveraging the norm_num tactic.

Thunder 3

Tailored from the MATH dataset

Let $f(x) = Ax + B$ and $g(x) = Bx + A$, the save $A ne B$. If $f(g(x)) – g(f(x)) = B – A$, level to that $A + B = 0$.

theorem mathd_train_algebra_217
  (a b : ℝ)
  (f g : ℝ → ℝ)
  (h₀ : ∀ x, f x = a x + b)
  (h₁ : ∀ x, f x = b x + a)
  (h₂ : a ≠ b)
  (h₃ : ∀ x, f (g x) - g (f x) = b - a) :
  a + b = 0 :=
delivery
  revert h₀ h₁ h₂ h₃,
  
  contrapose!,
  rintro ⟨h₀, ⟨h₁, h₂⟩⟩,
  
  
  exhaust (0 : ℝ),
  simp most efficient [sub_eq_iff_eq_add, h₀, mul_zero, zero_add],
  norm_num at h₀,
pause

Our devices, trained with assertion curriculum studying, had been in a position to conclude a diversity of complications from practicing textbooks besides to AMC12 and AIME competitions, and 2 complications adapted from the IMO. We uncover below three examples of such generated proofs.

Thunder 4

Tailored from IMO 1964 Thunder 2

Insist $a$, $b$, $c$ are the aspects of a triangle.
Ticket that $a^2(b + c − a) + b^2(c + a − b) + c^2(a + b − c) leq 3abc$.

theorem imo_1964_p2
  (a b c : ℝ)
  (h₀ : 0 delivery
  
  nlinarith [sq_nonneg (b - a),
             sq_nonneg (c - b),
             sq_nonneg (c - a)]
pause

Thunder 5

Tailored from AIME 1984 Thunder 1

Ticket that $a2 + a4 + a6 + a8 + …+ a98 = 93$ if $a1$, $a2$, $a3…$ is an arithmetic progression with same outdated distinction $1$, and $a1 + a2 + a3 + … + a98 = 137$.

theorem aime_1984_p1
  (u : ℕ → ℚ)
  (h₀ : ∀ n, u (n + 1) = u n + 1)
  (h₁ : ∑ k in finset.differ 98, u k.succ = 137) :
  ∑ k in finset.differ 49, u (2 k.succ) = 93 :=
delivery
  rw finset.sum_eq_multiset_sum,
  dsimp [finset.range] at h₁,
  simp [h₀],
  ring,
  norm_num at h₁,
  norm_num,
  apply eq_of_sub_eq_zero,
  { simp most efficient [*, abs_of_pos, add_zero] at *, linarith },
pause

Thunder 6

Tailored from IMO Longlist 1990 Thunder 77


For $a, b, c$ reals, level to that $(a^2 + ab + b^2)(b^2 + bc + c^2)(c^2 + ca + a^2) geq (ab + bc + ca)^3$.

theorem imo_longlist_1990_p77
  (a b c : ℝ) :
  (a b + b c + c a)^3 ≤
    (a^2 + a b + b^2) (b^2 + b c + c^2) (c^2 + c a + a^2) :=
delivery
  
  
  
  let u : euclidean_space ℝ (fin 2) := ![a, b],
  let v : euclidean_space ℝ (fin 2) := ![b, c],
  have h₀ := real_inner_mul_inner_self_le u v,
  simp [u, v, fin.sum_univ_succ, 
        ←pow_two, ←pow_two, le_of_lt, mul_assoc] at h₀,
  
  
  have h₃ : 0 ≤ (c + a) (c + a),
  { nlinarith, },
  have h₄ := sq_nonneg (a b + b c + c a),
  simp [sq, h₀, h₃, mul_add, add_mul] at h₄ ⊢,
  nlinarith [sq_nonneg (b - a),
             sq_nonneg (c - b),
             sq_nonneg (a - c)]
pause

Formal mathematics involves two predominant challenges that accomplish a naive utility of reinforcement studying now no longer liable to be successful.

  • (i) Infinite action residence: now no longer most efficient does formal mathematics have a namely tall search residence (love Skedaddle as an instance), it also has an monumental action residence. At every step of a proof search, the model must decide now no longer from a properly-behaved finite residence of actions, however a complex and limitless residence of tactics, full of life exogenous mathematical terms that must gentle be generated (e.g., generating a mathematical assertion to be outmoded as a look, an object outmoded in steps similar to “there exists an $x$ s.t. …”, or a gash, the introduction and the chaining of a lemma throughout a proof).
  • (ii) Lack of self-play: conversely to 2-participant video games, a prover is now no longer taking part in against an opponent however against a residence of statements to level to. When faced with an announcement that is correct too onerous, there isn’t any longer a glaring reframing that can let the prover generate middleman simpler statements to form out first. This asymmetry prevents naive utility of the self-play algorithms that had been winning with 2-participant video games.

In our work, we address the limitless action residence issue by sampling actions from a language model as we glimpse for a proof. Language devices have the skill to generate the tactic calls besides to the distinctive mathematical terms on the total required as arguments. Our foundation for addressing the dearth of self-play is the inform that the key position of self-play in 2-participant video games is to produce an unmonitored curriculum. Our methodology proposes to interchange this unsupervised curriculum with an auxiliary residence of issue statements (without requiring proofs) of more than just a few distress. We empirically present an explanation for that, when the distress of these auxiliary complications is numerous ample, our practicing design is in a position to solve a curriculum of increasingly more advanced complications, eventually generalizing to the residence of complications we care about.

While these outcomes are extraordinarily thrilling, as they show that deep studying devices are in a position to non-trivial mathematical reasoning when interacting with a proper system, we’re gentle very removed from simplest-student performance on these competitions, most efficient once in a whereas, in its place of continuously, closing tough olympiad complications. We hope alternatively that our work will motivate compare in this domain, in explicit in direction of the IMO Mountainous Thunder and that the assertion curriculum studying methodology we suggest will motivate trudge up development in automatic reasoning on the total.

Content Protection by DMCA.com

Back to top button