Weekly links, March 5 2026


            
        March 5, 2026
    
    
Weekly links, March 5 2026


        A lot has gone on with Anthropic, their new responsible scaling policy, the DoD,  etc.
No links about that this week - instead:
https://alignment.anthropic.com/2026/psm/: Deep explanation of the Persona Selection Model, written in part by Chris Olah (a cofounder of Anthropic).
https://ampcode.com/notes/feedback-loopable: This is super cool. Highly recommend.
https://bounded-regret.ghost.io/oversight-assiturning-compute-into-understanding/: There are many oversight tasks where discovery is hard but verification of discoveries is (relatively) cheap, for example bug discovery in code or reward hacking in RL environments. Can we leverage that to train highly powerful "oversight assistants" - models which are superhuman at helping us oversee other models?
And I made this:
https://github.com/Julian-Moncarz/watchdog
https://watchdog-nu.vercel.app/
    

                                Don't miss what's next. Subscribe to Julian Moncarz:
                            
                        
            Email address (required)
            
            
                    ← Newer
                
                Weekly links, Monday March 30, 2026
            
        
                    Older →
                
                Weekly Links, February 22nd, 2026.