INDEX
Explanations
phrases related to winning or successes
New Auto-Interp
Negative Logits
tes
-0.19
vore
-0.16
EB
-0.16
als
-0.16
libertine
-0.15
ei
-0.14
uteur
-0.14
les
-0.14
otte
-0.14
visor
-0.13
POSITIVE LOGITS
nable
0.24
throp
0.17
-win
0.17
ners
0.15
imbledon
0.15
rate
0.14
eyse
0.14
ildo
0.14
now
0.14
«ng
0.14
Activations Density 0.066%