INDEX
Explanations
punctuation and formatting characters in text
New Auto-Interp
Negative Logits
Č
-0.23
ayscale
-0.16
camp
-0.15
Fried
-0.15
aland
-0.15
Edition
-0.14
itchens
-0.14
Butt
-0.14
Occurred
-0.14
úb
-0.14
POSITIVE LOGITS
č↵č↵č↵
0.22
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
0.22
###
0.21
##
0.21
####
0.18
----------↵↵
0.17
---↵↵
0.16
↵↵↵↵↵↵↵
0.16
алом
0.16
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
0.15
Activations Density 0.356%