INDEX
Explanations
proper nouns or personal names
mentions of a specific individual, likely a notable person in sports
New Auto-Interp
Negative Logits
terday
-0.79
EED
-0.79
conclud
-0.70
eele
-0.68
anwhile
-0.67
ignment
-0.60
align
-0.59
WAYS
-0.59
ateral
-0.59
enegger
-0.58
POSITIVE LOGITS
rique
1.35
ning
1.07
rik
1.04
riks
1.03
nery
0.99
sel
0.98
lein
0.98
ricks
0.95
ric
0.93
lund
0.92
Activations Density 0.029%