INDEX
Explanations
references to the concept of "fake."
New Auto-Interp
Negative Logits
atto
-0.18
aidu
-0.17
ache
-0.15
phan
-0.15
ions
-0.15
ÄĽtÅ¡
-0.15
udiante
-0.14
поÑģл
-0.14
ÏĦοκ
-0.14
atori
-0.14
POSITIVE LOGITS
kus
0.16
anton
0.16
-script
0.14
olina
0.14
ko
0.14
_Cmd
0.14
scoop
0.13
åĨĴ
0.13
Bid
0.13
KO
0.13
Activations Density 0.012%