INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ا
    2.42
    ی
    1.98
    ش
    1.84
    ции
    1.77
    1.77
    দের
    1.74
    1.73
    1.64
    त्
    1.59
    وم
    1.58
    POSITIVE LOGITS
    1.59
    dalam
    1.50
    ্বেও
    1.48
    1.44
    lm
    1.43
     والصلاه
    1.43
    theta
    1.42
    lll
    1.41
    de
    1.38
    [^
    1.37
    Act Density 0.010%

    No Known Activations