INDEX
Explanations
expressions related to opinions or feelings
expressions of personal responsibility or actions taken by individuals
New Auto-Interp
Negative Logits
idding
-0.79
tree
-0.79
naire
-0.74
dropping
-0.74
arie
-0.73
band
-0.71
Machina
-0.70
ansas
-0.70
arter
-0.68
mingham
-0.68
POSITIVE LOGITS
pains
1.27
pride
1.25
offence
1.14
pleasure
1.11
cues
1.09
precedence
1.09
offense
1.08
advantage
1.07
comfort
1.05
responsibility
1.05
Activations Density 0.082%