INDEX
Explanations
statements that assert or emphasize factual information
New Auto-Interp
Negative Logits
erah
-0.18
axon
-0.17
fant
-0.15
uyu
-0.15
chwitz
-0.15
Anyway
-0.15
Anyway
-0.14
inheritDoc
-0.14
yet
-0.13
ICO
-0.13
POSITIVE LOGITS
sogar
0.19
scratch
0.19
ually
0.17
even
0.17
ave
0.16
arend
0.16
umber
0.16
lider
0.15
ῦ
0.15
sometimes
0.15
Activations Density 0.028%