INDEX
Explanations
mentions of authors and their works
New Auto-Interp
Negative Logits
ary
-0.20
iguous
-0.16
znik
-0.16
aq
-0.14
567
-0.14
aries
-0.14
ei
-0.14
McDon
-0.14
æĹı
-0.14
ARY
-0.14
POSITIVE LOGITS
itative
0.19
admin
0.18
cb
0.16
etas
0.15
ama
0.15
izen
0.15
CB
0.15
\"
0.15
itarian
0.15
SHIP
0.15
Activations Density 0.010%