INDEX
Explanations
questions and statements related to uncertainty or criticism
negative questions and statements
New Auto-Interp
Negative Logits
rehend
-0.57
ablishment
-0.57
Kun
-0.57
osate
-0.56
iatrics
-0.56
Sapp
-0.55
VERTISEMENT
-0.54
ileged
-0.54
Success
-0.53
Tropical
-0.53
POSITIVE LOGITS
pecially
0.90
cause
0.89
preferring
0.82
etheless
0.75
_>
0.71
especially
0.69
suggesting
0.68
but
0.66
akin
0.65
notwithstanding
0.65
Activations Density 0.401%