INDEX
Explanations
negative sentiments or expressions
New Auto-Interp
Negative Logits
unſer
-0.73
<unused79>
-0.71
<unused41>
-0.71
ſeyn
-0.71
<unused23>
-0.71
<unused14>
-0.71
<unused52>
-0.71
<unused74>
-0.71
[@BOS@]
-0.70
<unused1>
-0.70
POSITIVE LOGITS
-
0.61
P
0.44
paddingBottom
0.40
_
0.40
V
0.39
p
0.39
C
0.39
↵
0.39
ug
0.38
Sha
0.38
Activations Density 0.001%