INDEX
Explanations
titles and positions of academic professionals
New Auto-Interp
Negative Logits
Brains
-0.16
ogs
-0.15
rug
-0.15
ç̬
-0.15
egin
-0.15
sonian
-0.14
Griffith
-0.14
iece
-0.14
bras
-0.14
pam
-0.14
POSITIVE LOGITS
Sark
0.16
INDIRECT
0.14
olib
0.13
çķª
0.13
zik
0.13
AGO
0.13
/share
0.13
Brock
0.13
HEMA
0.13
reet
0.13
Activations Density 0.016%