INDEX
Explanations
negations and words indicating the concept of "not."
New Auto-Interp
Negative Logits
strup
-0.15
isko
-0.15
OfDay
-0.14
age
-0.14
hong
-0.14
861
-0.14
863
-0.13
dae
-0.13
essen
-0.13
hei
-0.13
POSITIVE LOGITS
just
0.15
ensch
0.15
etur
0.15
ost
0.15
achi
0.15
byt
0.15
agna
0.15
ouve
0.14
wich
0.14
ones
0.14
Activations Density 0.069%