INDEX
Explanations
proper nouns or initials related to organizations or projects
references to specific organizations or research firms
New Auto-Interp
Negative Logits
Eisen
-0.77
flies
-0.76
lust
-0.72
Apprentice
-0.71
thumbnails
-0.65
Cecil
-0.64
âĸijâĸij
-0.64
athan
-0.64
enegger
-0.64
bell
-0.64
POSITIVE LOGITS
KP
1.05
rint
0.96
_.
0.92
PK
0.88
dh
0.85
PK
0.82
rompt
0.81
olicy
0.79
©¶æ
0.79
srf
0.76
Activations Density 0.010%