INDEX
    Explanations

    references to the word "Dodger" and its variations

    New Auto-Interp
    Negative Logits
    ee
    -0.21
    o
    -0.21
    ing
    -0.20
    y
    -0.20
    yb
    -0.19
    oog
    -0.19
    ey
    -0.18
    esi
    -0.17
    ed
    -0.17
    ean
    -0.17
    POSITIVE LOGITS
    ding
    0.32
    yssey
    0.28
    ded
    0.28
    ders
    0.28
    der
    0.26
    dy
    0.26
    ges
    0.24
    ds
    0.23
    nocenÃŃ
    0.23
    den
    0.21
    Act Density 0.027%

    No Known Activations