INDEX
Explanations
phrases indicating certainty or emphasis
instances of negation or refusal in various contexts
New Auto-Interp
Negative Logits
RAD
-0.51
guiActiveUnfocused
-0.51
creen
-0.49
scattering
-0.48
shroud
-0.48
cottage
-0.46
lodging
-0.46
Manhattan
-0.46
scatter
-0.46
shack
-0.46
POSITIVE LOGITS
¬
0.82
£
0.78
¡
0.77
¹
0.77
Ĵ
0.77
ı
0.74
¼
0.74
º
0.73
¢
0.71
ł
0.69
Activations Density 0.516%