INDEX
Explanations
references to trust and relationships with individuals or organizations
New Auto-Interp
Negative Logits
interesse
-0.15
charm
-0.15
ÑģÑİ
-0.15
quist
-0.14
856
-0.14
ilon
-0.14
stanov
-0.14
.touches
-0.14
ÄĽle
-0.14
tesy
-0.14
POSITIVE LOGITS
implicitly
0.24
implicit
0.19
judgment
0.19
implicitly
0.18
abilities
0.18
ereo
0.16
judgement
0.16
Implicit
0.16
Implicit
0.15
implicit
0.15
Activations Density 0.089%