INDEX
Explanations
punctuation and formatting indicators within the text
New Auto-Interp
Negative Logits
allen
-0.15
ppe
-0.15
allah
-0.15
axe
-0.15
ç·Ĵ
-0.15
θα
-0.14
ulace
-0.14
yal
-0.14
robat
-0.14
ç¬
-0.14
POSITIVE LOGITS
ubi
0.16
æº
0.16
bar
0.16
azes
0.15
afe
0.15
456
0.15
intr
0.15
olon
0.14
urry
0.14
_NC
0.14
Activations Density 0.010%