INDEX
Explanations
actions that involve physical or emotional interaction with others
actions related to emotional expressions and decision-making
New Auto-Interp
Negative Logits
umn
-0.64
roma
-0.62
MSN
-0.60
uum
-0.60
Extrem
-0.60
utenberg
-0.59
][
-0.59
>>\
-0.58
olon
-0.57
UGC
-0.57
POSITIVE LOGITS
accordingly
0.99
alike
0.88
oneself
0.84
yourself
0.83
ourselves
0.81
yourselves
0.79
versa
0.78
them
0.76
ings
0.74
ables
0.73
Activations Density 0.393%