INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     autorytatywna
    -0.52
    <bos>
    -0.50
     تانيه
    -0.49
    ^(@)
    -0.47
    amom
    -0.47
    pouring
    -0.47
    -\\
    -0.45
     Betracht
    -0.45
    })$}
    -0.44
    kook
    -0.44
    POSITIVE LOGITS
    awaiter
    0.68
    parsedMessage
    0.68
     препратки
    0.66
     ddelweddau
    0.62
    ViewFeatures
    0.61
    buttonShape
    0.61
     Kenobi
    0.60
    Становништво
    0.60
    igshid
    0.60
     artistico
    0.58
    Act Density 0.012%

    No Known Activations