INDEX
Explanations
negations and expressions of doubt or uncertainty
New Auto-Interp
Negative Logits
.synthetic
-0.17
oley
-0.17
UNUSED
-0.16
aghan
-0.15
ίθ
-0.14
alker
-0.14
ilerini
-0.14
.Empty
-0.14
okers
-0.14
olla
-0.14
POSITIVE LOGITS
bad
0.90
Bad
0.85
Bad
0.79
bad
0.79
BAD
0.77
_bad
0.66
BAD
0.65
worst
0.63
worse
0.63
.bad
0.59
Activations Density 0.220%