INDEX
Explanations
words that denote skepticism, questioning, or criticism
New Auto-Interp
Negative Logits
oother
-0.78
ilege
-0.76
rio
-0.73
obook
-0.72
hner
-0.71
ynthesis
-0.69
endment
-0.69
psey
-0.68
umbn
-0.68
othal
-0.67
POSITIVE LOGITS
enough
0.95
ones
0.80
amounts
0.79
huh
0.76
alike
0.76
sounding
0.71
indeed
0.71
nonetheless
0.71
since
0.70
strokes
0.70
Activations Density 0.192%