| | Sidestepping Evaluation Awareness and Anticipating Misalignment (alignment.openai.com) |
| 1 point by taubek 47 days ago | past |
|
| | Sidestepping Evaluation Awareness and Anticipating Misalignment with Evaluations (alignment.openai.com) |
| 3 points by michaefe 49 days ago | past |
|
| | Why We Are Excited About Confessions (alignment.openai.com) |
| 2 points by fdeage 66 days ago | past |
|
| | We Are Excited About Confessions (alignment.openai.com) |
| 2 points by gwintrob 69 days ago | past |
|
| | We Are Excited About Confessions (alignment.openai.com) |
| 4 points by TMWNN 71 days ago | past |
|
| | A Practical Approach to Verifying Code at Scale (alignment.openai.com) |
| 1 point by gmays 3 months ago | past |
|
| | Debugging misaligned completions with sparse-autoencoder latent attribution (alignment.openai.com) |
| 1 point by gmays 3 months ago | past |
|
| | Alignment Research Blog (alignment.openai.com) |
| 2 points by ironyman 3 months ago | past |
|
| | Debugging misaligned completions with sparse-autoencoder latent attribution (alignment.openai.com) |
| 1 point by rd 3 months ago | past |
|