INDEX
Explanations
nouns and terms related to communities, professions, and group identities
New Auto-Interp
Negative Logits
imson
-0.16
Ïħνα
-0.15
arcy
-0.15
ugh
-0.14
acho
-0.14
Phó
-0.14
_are
-0.14
recated
-0.14
ANGED
-0.13
meyi
-0.13
POSITIVE LOGITS
eye
0.22
head
0.22
hit
0.20
call
0.19
ponder
0.18
brace
0.18
rack
0.18
target
0.17
score
0.17
say
0.17
Activations Density 0.145%