INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ی
    1.08
    ف
    0.88
    .
    0.83
    ب
    0.79
    ,
    0.79
    0.77
    ের
    0.71
    iin
    0.71
    اس
    0.70
    س
    0.69
    POSITIVE LOGITS
    o
    0.86
     făcut
    0.71
     has
    0.64
    0.63
     مي
    0.63
    m
    0.60
     не
    0.59
     භාවිත
    0.59
     hebt
    0.58
     nicht
    0.57
    Act Density 0.003%

    No Known Activations