INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.96
     it
    0.81
     to
    0.78
    0.76
    س
    0.75
    ح
    0.75
    آ
    0.72
    0.72
     देम
    0.69
    ف
    0.68
    POSITIVE LOGITS
    is
    0.98
    il
    0.95
    n
    0.93
    at
    0.92
    description
    0.82
    la
    0.79
    re
    0.78
    ла
    0.78
    oare
    0.73
    a
    0.73
    Act Density 0.026%

    No Known Activations