INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    vyk
    -0.07
    وین
    -0.07
    autical
    -0.07
    ฟอร
    -0.07
     Claw
    -0.06
    ंपन
    -0.06
     misdemeanor
    -0.06
    itest
    -0.06
     patrons
    -0.06
     Marino
    -0.06
    POSITIVE LOGITS
     addition
    0.06
     باغ
    0.06
     Scr
    0.06
     pys
    0.06
     afflict
    0.06
     retir
    0.06
     allev
    0.06
    */↵↵↵
    0.06
     briefing
    0.06
     nắng
    0.06
    Act Density 0.021%

    No Known Activations