INDEX
Explanations
questions
questions in the text
New Auto-Interp
Negative Logits
spilled
-0.88
healed
-0.73
onite
-0.72
soaked
-0.70
rek
-0.70
urger
-0.69
offending
-0.69
discharged
-0.68
exc
-0.68
alled
-0.68
POSITIVE LOGITS
¶
1.18
Why
1.09
Answer
1.04
Where
1.03
Nope
1.02
Consider
1.01
Simply
0.99
[/
0.98
Does
0.98
What
0.97
Activations Density 0.090%