INDEX
Explanations
words related to personal interactions and pronouns
New Auto-Interp
Negative Logits
Threshold
-0.17
sq
-0.15
vil
-0.15
VAS
-0.14
threshold
-0.14
å´
-0.14
Crosby
-0.13
Threshold
-0.13
pan
-0.13
Coleman
-0.13
POSITIVE LOGITS
ufen
0.17
ÂĽ
0.15
nger
0.15
roupe
0.14
istra
0.14
онов
0.14
juana
0.14
traffic
0.14
arin
0.14
azing
0.14
Activations Density 0.000%