Edge of Tomorrow Algorithms

James Braza
August 20th, 2025

Palm Springs – Nyles and others loop after entering a portal

Groundhog Day – Phil relives same day until he becomes a better person

Happy Death Day - Tree relives the same day until she survives

William Cage: I don't know.
We've never gotten this far.

Edge of Tomorrow – Cage repeats the same day until an alien invasion succeeds or fails

Rita Vrataski: What do we do now?

Data Flywheels

Expert Iteration (EI)

Algorithm 1

Initial
policy π₁

Starting policy π_i

Dataset
D_i = data with
reward > 0

Supervised Fine Tuning (SFT) on D_i

Updated policy π_i

Batch of rollouts

...

Prmpt₁,Compl,Rwd

Prmpt_B,Compl,Rwd

Final
policy π_N

Goal: repeatedly supervised fine-tune a model on a prior model's correct outputs

N loops

aviary paper: EI enables an 8B model to surpass frontier models

Expert Iteration Reinforcement Learning
(EIRL)

Algorithm 2

Initial
policy π₁

Batch of rollouts

...

Prmpt₁,Compl,Rwd

Prmpt_B,Compl,Rwd

Batch of rollouts

...

Prmpt_B+1,Compl,Rwd

Prmpt_2B,Compl,Rwd

Starting policy π_i

Updated policy π_i+1

SFT on D_i-1

Use RL to progressively improve the starting model

Initial Dataset D₀

Dataset D_i-1

Final
policy π_N

Reinforcement Learning w/Verifiable Rewards (RLVR)

N loops

Dataset
D_i = data with
reward > 0

ether0 paper: used N=2 with multitask learning

i=1

Data Selection

Advantage-Based Curriculum Learning

Algorithm 3

Group_j

Completion_j,1

Completion_j,2

Completion_j,G

...

Non-trivial
(learnable)
prompts

Trivial (too easy or hard) prompts

Non-trivial
(learnable)
prompts

RLVR_i-1
rollout

Prompt_j

Policy π_i

Buffer problem difficulty reusing GRPO groups

RL with learnable problems

(Current) RLVR_i: Spend Buffer

(Prior) RLVR_i-1: Build Buffer

RLVR_i
rollout

Prompt_j

1-ε

Mixed

Advantage

All 0

Advantage

ether0 paper

Thank You

Contact FutureHouse

hello@futurehouse.org

Edge of Tomorrow algorithms LaTeX algorithmic

All Algorithms

Shoutout to Siddharth Narayanan and Andrew White for their feedback and support

Edge of Tomorrow Algorithms

James Braza August 20th, 2025

Data Flywheels

Expert Iteration (EI)

Expert Iteration Reinforcement Learning(EIRL)

Data Selection

Advantage-Based Curriculum Learning

Thank You

James Braza
August 20th, 2025

Expert Iteration Reinforcement Learning
(EIRL)