INDEX
Explanations
words and phrases describing specific attitudes or behaviors related to willingness or refusal
expressions related to willingness and unwillingness in decision-making or actions
New Auto-Interp
Negative Logits
umenthal
-0.63
=-=-=-=-=-=-=-=-
-0.55
oppable
-0.55
Genie
-0.55
ordon
-0.51
poon
-0.51
ugu
-0.51
oval
-0.51
glomer
-0.51
mson
-0.50
POSITIVE LOGITS
to
1.22
thereto
0.92
To
0.84
to
0.83
To
0.83
towards
0.83
toward
0.82
TO
0.81
lessness
0.78
ta
0.77
Activations Density 0.099%