INDEX
Explanations
mentions of significant actions or achievements in sports contexts
New Auto-Interp
Negative Logits
utz
-0.16
xt
-0.16
eken
-0.15
orton
-0.15
esso
-0.15
ey
-0.15
figure
-0.14
owitz
-0.14
atte
-0.14
öm
-0.14
POSITIVE LOGITS
liking
0.16
lixir
0.15
-step
0.15
seats
0.15
hiba
0.14
inema
0.14
step
0.14
ëĭ´
0.14
steps
0.14
steps
0.14
Activations Density 0.045%