INDEX
Explanations
sentences expressing strong emotions or opinions
repetitive statements and assertions
New Auto-Interp
Negative Logits
esm
-0.78
elve
-0.73
dozen
-0.71
styles
-0.70
aneers
-0.69
luaj
-0.68
ses
-0.68
byss
-0.67
opez
-0.66
enne
-0.65
POSITIVE LOGITS
why
0.93
unacceptable
0.93
happening
0.91
how
0.90
NOT
0.85
what
0.84
supposed
0.84
shaping
0.81
bullshit
0.80
definitely
0.80
Activations Density 0.097%