INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ur
    -1.84
    UR
    -1.68
     ur
    -1.36
     UR
    -1.33
    Ur
    -1.32
     Ur
    -1.29
    URR
    -1.13
    urma
    -1.05
    urp
    -1.05
    urit
    -1.04
    POSITIVE LOGITS
    med
    0.55
    do
    0.52
    the
    0.49
     الرياضيه
    0.49
    rer
    0.49
    ng
    0.48
    inal
    0.48
    red
    0.47
    k
    0.47
    NegativeButton
    0.47
    Act Density 0.060%

    No Known Activations