INDEX
Explanations
phrases indicating inability or frustration
New Auto-Interp
Negative Logits
uz
-0.15
useClass
-0.15
MLE
-0.15
uras
-0.14
adb
-0.14
ickers
-0.14
fdb
-0.14
ιλο
-0.14
-svg
-0.14
oload
-0.14
POSITIVE LOGITS
stomach
0.24
bear
0.20
shake
0.20
contain
0.20
resist
0.19
Shake
0.19
shakes
0.18
stom
0.18
focus
0.18
shaking
0.18
Activations Density 0.109%