INDEX
Explanations
mentions of specific character names or identities
New Auto-Interp
Negative Logits
>NN
-0.17
isman
-0.16
zdy
-0.15
ombo
-0.15
qli
-0.15
بر
-0.14
ropp
-0.14
Ramos
-0.14
اÙħÛĮÙĨ
-0.14
bdb
-0.14
POSITIVE LOGITS
butt
0.17
aber
0.15
Everett
0.15
cke
0.15
oge
0.15
ande
0.15
Butt
0.15
æģ©
0.14
Won
0.14
ÅĤo
0.14
Activations Density 0.337%