INDEX
Explanations
phrases indicating different physical and emotional states or activities of a person
expressions of emotional intensity or strong feelings
New Auto-Interp
Negative Logits
attRot
-0.66
Methods
-0.65
azard
-0.65
selves
-0.64
unison
-0.64
arser
-0.61
respectively
-0.61
TPS
-0.58
their
-0.58
jointly
-0.57
POSITIVE LOGITS
himself
1.33
Himself
1.10
herself
1.08
his
1.03
tonight
0.93
His
0.93
HIS
0.93
his
0.87
His
0.86
Wife
0.78
Activations Density 0.791%