INDEX
Explanations
names of historical figures and scientists
New Auto-Interp
Negative Logits
BUG
-0.14
allas
-0.13
oring
-0.13
avid
-0.13
CardContent
-0.13
subst
-0.12
βα
-0.12
ç¡
-0.12
aling
-0.12
êu
-0.12
POSITIVE LOGITS
Ã¶ÄŁ
0.15
Jr
0.15
psc
0.15
gran
0.14
Lion
0.14
âĢł
0.13
Smy
0.13
192
0.12
iena
0.12
mann
0.12
Activations Density 0.104%