INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.84
     mammals
    -0.84
     positifs
    -0.84
     متعلقه
    -0.82
     bénévoles
    -0.82
     يتيمه
    -0.81
     feroit
    -0.79
     مرئيه
    -0.78
     pouvoit
    -0.77
     hazard
    -0.77
    POSITIVE LOGITS
     and
    0.57
     che
    0.53
    asinya
    0.46
    ,
    0.46
     an
    0.46
     erst
    0.45
     or
    0.45
    ans
    0.45
    0.44
     the
    0.43
    Act Density 0.049%

    No Known Activations