INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    !
    0.67
    !"
    0.60
     !!
    0.60
    !!
    0.59
    !]
    0.58
    !:
    0.58
    !")
    0.57
    !";
    0.57
     yummy
    0.56
    0.56
    POSITIVE LOGITS
     मैंने
    0.59
     ostensibly
    0.53
    three
    0.53
     پیسې
    0.52
     不是
    0.52
    Fuck
    0.52
    มัน
    0.51
    fuck
    0.51
    我知道
    0.50
    thirty
    0.50
    Act Density 0.023%

    No Known Activations