INDEX
Explanations
instances of the word "won."
New Auto-Interp
Negative Logits
cia
-0.15
bos
-0.14
Kral
-0.14
ather
-0.14
bian
-0.14
å±ŀ
-0.14
yll
-0.14
thon
-0.13
cki
-0.13
uce
-0.13
POSITIVE LOGITS
nable
0.23
hearts
0.20
now
0.20
battles
0.18
ipeg
0.17
REP
0.16
-win
0.15
atur
0.14
emem
0.14
ê¶Į
0.14
Activations Density 0.036%