INDEX
Explanations
exclamations and affirmations
expressions of disagreement or negative responses
New Auto-Interp
Negative Logits
teenth
-1.05
velop
-0.77
icipated
-0.76
ILCS
-0.74
irth
-0.73
States
-0.73
emouth
-0.73
ugal
-0.73
ertility
-0.73
fullest
-0.71
POSITIVE LOGITS
Nope
1.11
blah
1.00
damned
0.86
Yep
0.85
darn
0.81
Sorry
0.79
bye
0.77
REALLY
0.77
sorry
0.77
dunno
0.76
Activations Density 0.031%