INDEX
Explanations
references to former professional athletes or individuals associated with sports
New Auto-Interp
Negative Logits
pÃŃsem
-0.20
urger
-0.16
yal
-0.15
Kir
-0.15
_LARGE
-0.15
:::
-0.15
aire
-0.14
pter
-0.14
957
-0.14
regs
-0.14
POSITIVE LOGITS
cel
0.19
Cel
0.18
mist
0.17
vole
0.17
Cel
0.16
Soup
0.16
.dev
0.16
MS
0.16
ibble
0.16
Mist
0.15
Activations Density 0.006%