INDEX
Explanations
names of individuals, particularly those associated with popular culture or media
New Auto-Interp
Negative Logits
su
-0.19
st
-0.19
pay
-0.17
nit
-0.17
elli
-0.17
lex
-0.17
speed
-0.16
sy
-0.16
ned
-0.16
sch
-0.16
POSITIVE LOGITS
yyyy
0.22
eva
0.22
ean
0.21
yyy
0.20
lic
0.20
lation
0.20
ville
0.20
mania
0.19
tics
0.18
ahoo
0.18
Activations Density 0.052%