INDEX
Explanations
phrases and sentences that convey contrasts or nuanced statements
New Auto-Interp
Negative Logits
odus
-0.15
nothing
-0.15
hlad
-0.14
leh
-0.14
Nothing
-0.14
emd
-0.14
få
-0.14
ãĥĸãĥª
-0.13
é«
-0.13
Nothing
-0.13
POSITIVE LOGITS
nor
0.50
Nor
0.42
nor
0.41
Nor
0.36
NOR
0.30
anymore
0.22
sondern
0.22
epad
0.18
ноÑĢ
0.18
Norwegian
0.17
Activations Density 0.142%