INDEX
Explanations
words related to winning and victory
New Auto-Interp
Negative Logits
yll
-0.17
iagnostics
-0.16
illon
-0.16
cia
-0.16
eb
-0.16
illo
-0.15
ASON
-0.15
bian
-0.15
wine
-0.15
wald
-0.15
POSITIVE LOGITS
nable
0.31
-win
0.23
ning
0.23
now
0.23
throp
0.21
ograd
0.20
ery
0.20
ners
0.19
-loss
0.19
try
0.18
Activations Density 0.050%