INDEX
    Explanations

    instances of the word "Before."

    New Auto-Interp
    Negative Logits
    omet
    -0.15
    ouv
    -0.15
    bou
    -0.15
    ÑĢажд
    -0.15
    opard
    -0.15
     shouldn
    -0.14
    arend
    -0.14
    antage
    -0.14
     Fle
    -0.14
    åĽ°
    -0.14
    POSITIVE LOGITS
    anzi
    0.18
    FD
    0.16
    erez
    0.16
    istrovstvÃŃ
    0.15
    hand
    0.14
    linger
    0.14
    ULLET
    0.14
     Bilg
    0.14
    _FD
    0.14
    .fd
    0.13
    Act Density 0.019%

    No Known Activations