INDEX
Explanations
topics related to violence and suffering
New Auto-Interp
Negative Logits
鹿
-0.17
íķŃ
-0.15
bsite
-0.15
izzo
-0.14
832
-0.14
اØ
-0.14
loh
-0.14
âijł
-0.14
gings
-0.13
ABCDE
-0.13
POSITIVE LOGITS
hundreds
0.17
ulta
0.15
guard
0.15
thousands
0.14
countless
0.14
Ñģол
0.14
861
0.14
Hundreds
0.13
?}",
0.13
adol
0.13
Activations Density 0.229%