INDEX
Explanations
phrases related to winning and losing in sports contexts
New Auto-Interp
Negative Logits
etal
-0.15
oten
-0.15
iet
-0.15
Grove
-0.14
gro
-0.14
ASON
-0.14
िथ
-0.14
((__
-0.14
endar
-0.14
hod
-0.14
POSITIVE LOGITS
battles
0.18
nable
0.17
ECTOR
0.16
krom
0.16
hearts
0.16
asan
0.15
battle
0.15
Yo
0.15
æ¯ĶèµĽ
0.15
egg
0.15
Activations Density 0.091%