INDEX
Explanations
phrases indicating goals or intended outcomes
New Auto-Interp
Negative Logits
Schroeder
-0.70
The
-0.65
ogeneous
-0.61
'
-0.59
"
-0.59
footnote
-0.58
↵
-0.58
<eos>
-0.58
C
-0.57
den
-0.56
POSITIVE LOGITS
aim
3.11
Aim
2.96
Aim
2.89
aim
2.76
Aims
2.61
aims
2.58
AIM
2.28
Aims
2.28
aiming
2.25
aimed
2.18
Activations Density 0.059%