INDEX
Explanations
prison sentences and punishments
New Auto-Interp
Negative Logits
optimise
0.43
Qt
0.42
quies
0.42
ottim
0.42
Spinach
0.41
OpenAI
0.41
optimize
0.40
cors
0.40
Optimize
0.40
Neurog
0.40
POSITIVE LOGITS
sentencing
2.03
Sentencing
1.73
sentences
1.72
punishments
1.72
punishment
1.63
sentenced
1.58
sentence
1.55
sentences
1.55
sentence
1.44
Sentence
1.41
Activations Density 0.056%