INDEX
Explanations
phrases indicating frequency or repetition
New Auto-Interp
Negative Logits
formace
-0.17
sert
-0.15
lix
-0.15
â̦”↵↵
-0.15
llib
-0.15
voÅĻ
-0.15
abbo
-0.15
IBUTE
-0.14
somehow
-0.14
ยม
-0.14
POSITIVE LOGITS
ìĶ©
0.21
-times
0.21
even
0.20
place
0.20
referred
0.17
ľ
0.17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0.16
even
0.15
kus
0.15
даже
0.15
Activations Density 0.017%