Edge of Tomorrow Algorithms

James Braza
August 20th, 2025

FutureHouse logo
Palm Springs movie image

Palm Springs – Nyles and others loop after entering a portal

Groundhog Day movie

Groundhog Day – Phil relives same day until he becomes a better person

Happy Death Day movie

Happy Death Day - Tree relives the same day until she survives

William Cage: I don't know.
We've never gotten this far.

Edge of Tomorrow movie

Edge of Tomorrow – Cage repeats the same day until an alien invasion succeeds or fails

Rita Vrataski: What do we do now?

Data Flywheels

Expert Iteration (EI)

Algorithm 1

Initial
policy π1

Starting policy πi

Dataset
Di = data with

reward > 0

Supervised Fine Tuning (SFT) on Di

Updated policy πi

Batch of rollouts

...

Prmpt1,Compl,Rwd

PrmptB,Compl,Rwd

Final
policy πN

Goal: repeatedly supervised fine-tune a model on a prior model's correct outputs

N loops

aviary paper: EI enables an 8B model to surpass frontier models

Figure 4A from aviary paper
Figure 4B from aviary paper
Figure 4 label from aviary paper
Figure 4 legend from aviary paper

Expert Iteration Reinforcement Learning
(EIRL)

Algorithm 2

Initial
policy π1

Batch of rollouts

...

Prmpt1,Compl,Rwd

PrmptB,Compl,Rwd

Batch of rollouts

...

PrmptB+1,Compl,Rwd

Prmpt2B,Compl,Rwd

Starting policy πi

Updated policy πi+1

SFT on Di-1

Use RL to progressively improve the starting model

Initial Dataset D0

Dataset Di-1

Final
policy πN

Reinforcement Learning w/Verifiable Rewards (RLVR)

N loops

Dataset
Di = data with

reward > 0

Figure 1 from ether0 paper

ether0 paper: used N=2 with multitask learning

i=1

Data Selection

Advantage-Based Curriculum Learning

Algorithm 3

Groupj

Completionj,1

Completionj,2

Completionj,G

...

Non-trivial
(learnable)
prompts

Trivial (too easy or hard) prompts

Non-trivial
(learnable)
prompts

Non-trivial
(learnable)
prompts

RLVRi-1
rollout

Promptj

Policy πi

Buffer problem difficulty reusing GRPO groups

RL with learnable problems

(Current) RLVRi: Spend Buffer

(Prior) RLVRi-1: Build Buffer

RLVRi
rollout

Promptj

1-ε

ε

Mixed

Advantage

All 0

Advantage

Thank You

Contact FutureHouse

hello@futurehouse.org

Edge of Tomorrow algorithms LaTeX algorithmic

All Algorithms

Shoutout to Siddharth Narayanan and Andrew White for their feedback and support