INDEX
Explanations
expressions indicating a preference or tendency towards a particular action
phrases related to willingness or tendency
New Auto-Interp
Negative Logits
UU
-0.75
Nation
-0.68
mel
-0.67
oufl
-0.67
Drama
-0.67
Semin
-0.66
ARDS
-0.65
Boom
-0.65
ARD
-0.64
recorded
-0.64
POSITIVE LOGITS
inclined
1.47
inclination
1.18
guiActiveUn
0.93
tempted
0.93
¿½
0.85
ĺħ
0.84
incl
0.84
suscept
0.84
lihood
0.82
userc
0.81
Activations Density 0.007%