INDEX
Explanations
phrases indicating a sense of negativity or discontent
New Auto-Interp
Negative Logits
ullo
-0.19
angen
-0.17
ninger
-0.16
uide
-0.15
nev
-0.15
äge
-0.15
OUS
-0.15
aris
-0.15
richer
-0.14
åı¯æĺ¯
-0.14
POSITIVE LOGITS
much
0.20
diss
0.19
ething
0.18
great
0.17
much
0.16
ragaz
0.16
Much
0.16
anymore
0.15
Much
0.15
ley
0.15
Activations Density 0.028%