INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ي
    0.75
    ال
    0.71
    ۔
    0.71
    И
    0.67
    0.63
    ،
    0.59
    ث
    0.57
    غ
    0.57
    كان
    0.56
    0.56
    POSITIVE LOGITS
     אחד
    0.51
     estern
    0.49
     for
    0.46
     mangiare
    0.45
     mental
    0.44
     Deze
    0.44
     siege
    0.44
     maze
    0.44
     plass
    0.44
     arreg
    0.44
    Act Density 0.029%

    No Known Activations