INDEX
    Explanations

    leaving messages or items

    New Auto-Interp
    Negative Logits
     TAS
    0.39
    yson
    0.37
     বেঙ্গল
    0.37
    ച്ഛ
    0.37
    discrimin
    0.37
     snout
    0.37
    blur
    0.37
    ëlle
    0.37
    verage
    0.36
    housing
    0.36
    POSITIVE LOGITS
    留言
    1.45
     message
    1.31
     messages
    1.31
    メッセージ
    1.25
     leaving
    1.23
     leave
    1.21
     Leave
    1.20
     Message
    1.16
     Leaving
    1.16
     оставля
    1.12
    Act Density 0.009%

    No Known Activations