INDEX
Explanations
names of authors and references in academic citations
New Auto-Interp
Negative Logits
linky
-0.18
ahren
-0.17
isy
-0.15
ustin
-0.15
996
-0.14
ruary
-0.14
idia
-0.13
nr
-0.13
mic
-0.13
iore
-0.13
POSITIVE LOGITS
Lam
0.17
hti
0.16
quadr
0.16
iveau
0.15
lam
0.15
quad
0.15
erken
0.15
Ā
0.14
ãĥªãĤ«
0.14
igli
0.14
Activations Density 0.010%