INDEX
    Explanations

    mentions of the name "Paul."

    New Auto-Interp
    Negative Logits
    essor
    -0.17
    ακ
    -0.15
    ãĥĭãĥ¡
    -0.15
    evil
    -0.15
    unan
    -0.15
    edd
    -0.14
    reta
    -0.14
    evin
    -0.14
    yk
    -0.14
    exchange
    -0.14
    POSITIVE LOGITS
    ine
    0.31
    son
    0.25
    sen
    0.24
    raj
    0.23
    INE
    0.20
    sson
    0.17
    SON
    0.17
    mie
    0.17
    s
    0.17
    ie
    0.17
    Act Density 0.017%

    No Known Activations