INDEX
Explanations
quoted text or comments in code
New Auto-Interp
Negative Logits
zel
-0.17
Spoiler
-0.16
wers
-0.16
Cher
-0.15
eker
-0.15
Gos
-0.15
Lonely
-0.14
ãĤ¸ãĤ¢
-0.14
иком
-0.14
cher
-0.14
POSITIVE LOGITS
oard
0.17
åIJĽ
0.16
645
0.16
otron
0.16
otor
0.15
æ·
0.15
okane
0.14
ourd
0.14
ONT
0.14
247
0.14
Activations Density 0.021%