I’m in the midst of doing the MATS program which has kept me super busy, but that didn’t stop me working on resolving the most important question of our time: What Hogwarts House does your chatbot belong to?
Continue readingYear: 2025
My Failed AI Safety Research Projects (Q1/Q2 2025)
This year I’ve been on sabbatical, and have spent my time upskilling in AI Safety. Part of that is doing independent research projects in different fields.
Some of those items have resulted in useful output, notably A Toy Model of the U-AND Problem, Do No Harm? and SAEs and their Variants.
And then there are others that I’ve just failed fast, and moved on.
I’ve detailed those projects that still have something to say, even if it’s mostly negative results.
Find them on LessWrong.
A Technique of Pure Reason
Looking a little ahead into the future, I think LLMs are going to stop being focused on knowledgeable, articulate chatbots, but instead be more efficient models that are weaker in these areas than current models, but relatively stronger at reasoning, a pure-reasoner model. The rest will be bolted on via tool-use and other scaffolding.
Fiddling Weights with Temperature
In procedural generation, the absolute simplest, most common technique is randomly picking an item from a list. More often than not, it is a weighted random choice, where each item is selected with a frequency proportional to its “weight”, a numerical value that can be tweaked by the designer.
def random_choice(items: list, weights: list[float]):
total_weight = sum(weights)
random_value = random.random() * total_weight
# Find the item that corresponds to the random number
for i, weight in enumerate(weights):
random_value -= weight
if random_value <= 0:
return items[i]
I wanted to share a technique from the Machine Learning community that is simple enhancement to this routine that gives a lot of convenience over tweaking weights directly, called temperature.
Continue readingThe A/B Not Taken
Thank you for your interest in FrostGPT™.
Rate which response you like the most?
Response 1
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,
And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.
I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
Response 2
Two paths diverged in an autumn wood,
And sorry I could not journey both
And be one wanderer, long I stood
And gazed down one as far as I could
To where it curved in the undergrowth;
Then saw another, as equally fair,
And having perhaps the stronger claim,
Because it was verdant and needed wear;
Though as for that the passing there
Had worn them truly about the same,
And both that morning equally lay
In leaves no foot had trodden black.
Oh, I saved the second for a future day!
Yet knowing how path leads on to path,
I questioned if I should ever come back.
I shall be sharing this with a sigh
Somewhere decades and decades hence:
Two paths diverged in a wood, and I—
I took the one more traveled by,
And that has made all the difference.
Exploring Rectangle Subdivisions
Last week, I saw a talk on Vuntra City, a procedurally generated city with a fully explorable city. Developer Larissa Davidova explained that she settled on using Recursive Subdivision for the city blocks, as she wanted some level of organicness, while still only having to deal with rectangles. But she didn’t like having indefinitely long roads that cause implausible sightlines.
One way Vuntra City handles this is by subdividing a rectangle into 5 blocks, a pattern I called “whirl” in my previous article on recursive subdivision. You can see that it has no internal roads that stretch across the entire map.

But Larissa’s talk got me thinking. The whirl pattern is interesting because it cannot be made from simple cuts. What other ways of subdividing a rectangle into smaller rectangles1, are out there?
Continue readingTransformerLens Quick Reference
TranformerLens is a Python library for Mechanistic Interpretability. It’s got some great tutorials… but they are all kinda verbose. Here’s a cheatsheet of all the common things you’ll want from the library. Click the links for more details.
Continue readingComputational Superposition in a Toy Model of the U-AND Problem
I’ve been working on some AI Safety research. It’s kinda dense for a blog, so I’m hosting elsewhere.
It’s investigation into how ML models do boolean at the most fundamental level. Under an assumption of feature sparsity, which is common for large models, certain patterns appear.
Running Tracery bots with LLMs
Tracery bots were a fun, simple, way of making generative texts. They are basically an easy way to specify generative grammars via a simple JSON file format. There used to be a horde of fun little tracery bots on twitter until API changes shut them all down.



Nowadays, you can prompt a chatbot to get whatever you want. But that lacks the same charm, and it doesn’t give you the control you’d want for something unleashed on the internet. Let’s do something about that.
Continue readingOuter Wilds Mission Graph
I recently played Outer Wilds and absolutely loved it. But after playing it I realised how little of the game was really on the critical path – most locations fill in lore and deepen your curiosity but don’t actually help you reach any location you couldn’t before. To analyse this, I made a mission graph for the game. Spoilers ahead.
SPOILERS IN FULL ARTICLE