INDEX
Explanations
references to historical events and figures in sports
New Auto-Interp
Negative Logits
yll
-0.18
åħ±åĴĮ
-0.17
vil
-0.17
mue
-0.16
afort
-0.16
arella
-0.15
ibri
-0.15
åįļ士
-0.15
Gol
-0.15
ittings
-0.15
POSITIVE LOGITS
hookers
0.26
Sale
0.25
Bath
0.25
Sar
0.24
provinces
0.24
scr
0.23
Scar
0.23
Barbar
0.23
flank
0.23
hook
0.22
Activations Density 0.015%