INDEX
Explanations
references to groups or categories of individuals or entities
New Auto-Interp
Negative Logits
ple
-0.06
(OP
-0.06
sociálnÃŃ
-0.06
æ§
-0.06
ÑĥлÑıÑĢ
-0.06
aque
-0.06
Guys
-0.06
Strikes
-0.06
iti
-0.06
zer
-0.06
POSITIVE LOGITS
alike
0.17
similar
0.11
others
0.10
simil
0.09
others
0.09
similar
0.08
aille
0.08
comparable
0.08
similarly
0.07
odb
0.07
Activations Density 0.031%