INDEX
Explanations
words related to competition and winning
instances of the word "win" in various contexts
New Auto-Interp
Negative Logits
erity
-0.76
Uz
-0.66
umn
-0.63
REDACTED
-0.62
UCHIJ
-0.61
alam
-0.60
senal
-0.60
condu
-0.60
gearing
-0.60
Dwell
-0.59
POSITIVE LOGITS
now
0.89
ners
0.82
nings
0.79
iem
0.74
throp
0.72
win
0.72
iors
0.71
't
0.70
ces
0.70
terson
0.69
Activations Density 0.039%