INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     متعلقه
    -0.77
     étoient
    -0.71
     avoient
    -0.70
     الرياضيه
    -0.69
     igång
    -0.63
     chré
    -0.62
     mourut
    -0.62
     förslag
    -0.60
     déput
    -0.58
     feroit
    -0.58
    POSITIVE LOGITS
     not
    0.60
     ab
    0.57
     -
    0.57
     —
    0.56
     rule
    0.55
    openzeppelin
    0.55
     to
    0.54
     st
    0.54
     ha
    0.54
     bio
    0.53
    Act Density 0.276%

    No Known Activations