INDEX
Explanations
references to sports moments and narratives of underdog victories
New Auto-Interp
Negative Logits
ornings
-0.16
Ñıд
-0.15
inati
-0.15
istrovstvÃŃ
-0.14
phere
-0.14
ields
-0.14
killer
-0.13
îł
-0.13
jadx
-0.13
зд
-0.13
POSITIVE LOGITS
upset
0.44
Cinder
0.33
surprise
0.31
David
0.30
under
0.28
unexpected
0.27
dark
0.26
Dav
0.25
surprises
0.25
unlikely
0.24
Activations Density 0.192%