INDEX
Explanations
phrases expressing negative outcomes or unfortunate situations
New Auto-Interp
Negative Logits
ollapsed
-0.16
cono
-0.15
Äįka
-0.15
swick
-0.14
innacle
-0.14
bedo
-0.14
wonder
-0.14
czy
-0.14
ünk
-0.14
ont
-0.13
POSITIVE LOGITS
ably
0.23
iteral
0.17
omik
0.15
omas
0.15
-looking
0.14
istan
0.14
/un
0.14
mente
0.14
CTS
0.14
-hero
0.14
Activations Density 0.015%