INDEX
Explanations
phrases related to negative societal issues or criticism
New Auto-Interp
Negative Logits
://%
-0.14
bourg
-0.14
oto
-0.14
à¥ĭà¤Ĥ,
-0.14
TM
-0.14
AVED
-0.14
:"-"`↵
-0.13
UID
-0.13
-д
-0.13
ancy
-0.13
POSITIVE LOGITS
/etc
0.56
etc
0.29
/
0.29
/&
0.26
combo
0.26
etc
0.25
combos
0.22
combination
0.21
hybrid
0.21
ratio
0.21
Activations Density 0.122%