INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     an
    0.81
     a
    0.75
     it
    0.73
     ovog
    0.71
    s
    0.70
    he
    0.70
    پ
    0.70
    ના
    0.68
    0.66
    the
    0.65
    POSITIVE LOGITS
    -
    1.21
    .
    0.83
    tól
    0.80
     of
    0.79
    د
    0.77
    تها
    0.69
    تين
    0.68
    0.68
    تنا
    0.67
    ление
    0.67
    Act Density 0.000%

    No Known Activations