INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     in
    0.67
    0.66
    0.64
     في
    0.61
     e
    0.58
     is
    0.57
    0.57
     antara
    0.56
     و
    0.55
     عن
    0.55
    POSITIVE LOGITS
    y
    0.47
     Variation
    0.45
     That
    0.44
    t
    0.43
    '.
    0.42
     themselves
    0.39
     conspiring
    0.39
     Who
    0.39
    tım
    0.39
    यों
    0.38
    Act Density 0.266%

    No Known Activations