INDEX
Explanations
personal pronouns and expressions of self-reference
New Auto-Interp
Negative Logits
CHANT
-0.15
punt
-0.15
ód
-0.15
Böl
-0.15
atem
-0.14
á»IJ
-0.14
ATEGORIES
-0.14
udget
-0.14
athe
-0.14
.intellij
-0.14
POSITIVE LOGITS
oen
0.18
aign
0.15
ingular
0.15
-angular
0.14
ÙĪØŃ
0.14
رÙĪØ¬
0.14
acs
0.14
igate
0.13
usp
0.13
opsy
0.13
Activations Density 0.076%