INDEX
Explanations
elements related to social behavior and relationships
New Auto-Interp
Negative Logits
udi
-0.15
emplate
-0.14
аков
-0.14
shal
-0.14
avit
-0.13
istrovstvÃŃ
-0.13
lay
-0.13
ocht
-0.13
oren
-0.13
YES
-0.13
POSITIVE LOGITS
eyen
0.15
DITION
0.15
udur
0.15
iyon
0.14
abouts
0.14
rál
0.13
sdale
0.13
оби
0.13
iny
0.13
DonaldTrump
0.13
Activations Density 0.360%