INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     متعلقه
    -0.67
    AnchorStyles
    -0.67
     varandra
    -0.59
     alimentaires
    -0.58
     vieille
    -0.58
     pintadas
    -0.57
     vieilles
    -0.56
     بيها
    -0.55
     gammel
    -0.55
     gamle
    -0.52
    POSITIVE LOGITS
     but
    0.76
     pero
    0.66
     professionals
    0.61
    kehr
    0.61
    but
    0.59
     tapi
    0.59
    torie
    0.58
     professors
    0.57
     engineers
    0.57
     members
    0.57
    Act Density 0.001%

    No Known Activations