INDEX
Explanations
references to the word "Dodger" and its variations
New Auto-Interp
Negative Logits
ee
-0.21
o
-0.21
ing
-0.20
y
-0.20
yb
-0.19
oog
-0.19
ey
-0.18
esi
-0.17
ed
-0.17
ean
-0.17
POSITIVE LOGITS
ding
0.32
yssey
0.28
ded
0.28
ders
0.28
der
0.26
dy
0.26
ges
0.24
ds
0.23
nocenÃŃ
0.23
den
0.21
Activations Density 0.027%