INDEX
Explanations
phrases related to making demands or complaints
expressions of humor and demands in discourse
New Auto-Interp
Negative Logits
ingham
-0.86
por
-0.82
acion
-0.74
ability
-0.73
lat
-0.72
abet
-0.72
amen
-0.71
backs
-0.70
erning
-0.69
able
-0.68
POSITIVE LOGITS
ĸļ
0.99
nesday
0.92
uled
0.83
aloud
0.79
Parenthood
0.75
teased
0.73
showc
0.73
repeatedly
0.73
sarcast
0.71
EStream
0.69
Activations Density 0.077%