INDEX
Explanations
various instances of names and naming decisions throughout the document
New Auto-Interp
Negative Logits
opath
-0.15
-rated
-0.15
illos
-0.14
Ratings
-0.14
elow
-0.14
-peer
-0.14
axiom
-0.13
opathic
-0.13
è¯Ŀ
-0.13
ucher
-0.13
POSITIVE LOGITS
hy
0.22
chosen
0.20
shorten
0.18
urally
0.18
descriptive
0.18
Adopt
0.17
adopted
0.17
ÙĨسبة
0.17
éŁ¿
0.17
pun
0.17
Activations Density 0.170%