INDEX
Explanations
phrases related to achieving victory or success
instances of the word "win."
New Auto-Interp
Negative Logits
alam
-0.70
erity
-0.68
rouch
-0.64
urches
-0.63
umn
-0.63
udes
-0.62
Uz
-0.61
agg
-0.59
condu
-0.58
Dwell
-0.58
POSITIVE LOGITS
throp
0.81
now
0.78
ners
0.73
iem
0.71
nings
0.70
riors
0.69
atown
0.69
stroke
0.68
ced
0.68
Citation
0.66
Activations Density 0.038%