INDEX
Explanations
personal pronouns and verb conjugations
words related to choices and consequences
New Auto-Interp
Negative Logits
Jou
-0.68
CSI
-0.67
axis
-0.66
Cologne
-0.65
Jasper
-0.65
atem
-0.65
Kem
-0.64
Helic
-0.64
Seahawks
-0.63
JA
-0.63
POSITIVE LOGITS
not
1.16
not
1.10
no
0.98
nonex
0.98
nt
0.97
lessness
0.96
NOT
0.94
NOT
0.94
Not
0.91
never
0.90
Activations Density 0.179%