INDEX
Explanations
references to likability and shared traits among individuals
New Auto-Interp
Negative Logits
acades
-0.17
like
-0.16
yme
-0.15
ntl
-0.15
ocker
-0.15
rou
-0.15
AGMA
-0.15
ÙĬج
-0.15
_COMPAT
-0.15
виÑĩ
-0.14
POSITIVE LOGITS
-minded
0.32
minded
0.29
unto
0.27
able
0.23
WISE
0.19
iliki
0.19
abled
0.19
ewise
0.18
uset
0.18
/dis
0.16
Activations Density 0.067%