INDEX
Explanations
expressions of criticism about societal or systemic issues
New Auto-Interp
Negative Logits
unlimited
-0.15
tả
-0.15
иÑģÑĮ
-0.14
orman
-0.14
ko
-0.14
Projection
-0.14
Champ
-0.14
kiye
-0.13
nonexistent
-0.13
projection
-0.13
POSITIVE LOGITS
aju
0.16
moment
0.15
andr
0.15
igon
0.15
acci
0.15
amar
0.15
gesi
0.14
/inc
0.14
coni
0.14
etc
0.14
Activations Density 0.014%