I’ve been working on some AI Safety research. It’s kinda dense for a blog, so I’m hosting elsewhere.
It’s investigation into how ML models do boolean at the most fundamental level. Under an assumption of feature sparsity, which is common for large models, certain patterns appear.