INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     on
    0.46
    لي
    0.46
    t
    0.44
     Zayed
    0.44
    smittel
    0.44
     Menschen
    0.43
    schule
    0.43
    že
    0.43
    ята
    0.42
    sa
    0.42
    POSITIVE LOGITS
    م
    0.58
    m
    0.54
    ing
    0.53
    ap
    0.53
    ا
    0.53
    0.49
    ف
    0.48
    ang
    0.48
    0.46
    ו
    0.46
    Act Density 0.473%

    No Known Activations