INDEX
Explanations
questions asked rhetorically or seeking confirmation
rhetorical questions
New Auto-Interp
Negative Logits
slam
-0.74
apan
-0.72
encount
-0.71
glim
-0.69
ription
-0.68
apers
-0.68
binge
-0.64
ilogy
-0.63
obos
-0.63
aper
-0.62
POSITIVE LOGITS
Nope
0.90
Why
0.87
����
0.84
Yeah
0.84
Well
0.83
Absolutely
0.82
Nah
0.81
Didn
0.81
Where
0.79
Okay
0.79
Activations Density 0.077%