INDEX
Explanations
the word "winner" with various intensities
terms related to winners and winning outcomes
New Auto-Interp
Negative Logits
uli
-0.75
uku
-0.64
Dub
-0.63
aeda
-0.63
abb
-0.62
uilt
-0.62
rients
-0.62
Ku
-0.61
ORE
-0.60
ugu
-0.60
POSITIVE LOGITS
winner
1.19
Winner
1.05
Winner
1.05
winners
0.97
contestant
0.95
prize
0.88
laureate
0.88
loser
0.88
winner
0.84
stakes
0.79
Activations Density 0.016%