INDEX
    Explanations

    numbers and percentages

    New Auto-Interp
    Negative Logits
    or
    0.64
    ла
    0.58
    0.57
    a
    0.56
    ر
    0.55
    ال
    0.54
    r
    0.52
    ни
    0.52
    𝐚
    0.52
    ли
    0.51
    POSITIVE LOGITS
    をと
    0.41
     unscrupulous
    0.40
     যারা
    0.39
    を有する
    0.37
    要把
    0.36
     reaffirm
    0.36
    要有
    0.36
    0.36
    を知
    0.36
     تعرض
    0.35
    Act Density 0.550%

    No Known Activations