INDEX
Explanations
punctuation and formatting symbols in text
New Auto-Interp
Negative Logits
yc
-0.16
avan
-0.15
ps
-0.15
nas
-0.14
alias
-0.14
али
-0.14
ungal
-0.14
umb
-0.13
ellow
-0.13
_Record
-0.13
POSITIVE LOGITS
igli
0.17
ahat
0.17
ÌĨ
0.15
evin
0.15
#
0.14
gross
0.14
illac
0.14
ìŀĸ
0.14
kova
0.14
kul
0.13
Activations Density 0.023%