INDEX
Explanations
mentions of the concept of "winning"
instances of the word "win"
New Auto-Interp
Negative Logits
Sentinel
-0.74
charg
-0.74
unauthorized
-0.73
underground
-0.72
charging
-0.69
charged
-0.68
ambient
-0.66
Anthem
-0.65
charges
-0.64
satirical
-0.64
POSITIVE LOGITS
win
4.51
WIN
2.27
Win
2.10
winner
1.75
winning
1.52
Win
1.49
win
1.41
wine
1.36
WIN
1.31
wyn
1.31
Activations Density 0.006%