INDEX
    Explanations

    pronouns and references to the speaker and their group

    New Auto-Interp
    Negative Logits
    Äįka
    -0.17
    IVA
    -0.16
    iaux
    -0.15
    WithValue
    -0.15
    ossa
    -0.14
    istrovstvÃŃ
    -0.14
    regor
    -0.14
    rena
    -0.14
    mlink
    -0.14
    trap
    -0.14
    POSITIVE LOGITS
    DT
    0.16
    ONEY
    0.16
    -fw
    0.15
     pin
    0.15
    CW
    0.15
     Gel
    0.14
     DT
    0.14
     Hubb
    0.14
    zi
    0.14
    -utils
    0.14
    Act Density 0.001%

    No Known Activations