INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    at
    0.68
    0.63
    6
    0.63
    7
    0.62
    4
    0.61
    ות
    0.59
    8
    0.59
    علم
    0.58
    2
    0.57
    im
    0.56
    POSITIVE LOGITS
     a
    0.64
     ג
    0.44
     unteren
    0.43
    0.43
    0.41
     а
    0.41
     
    0.40
     (\
    0.40
     I
    0.39
    이다
    0.39
    Act Density 0.299%

    No Known Activations