INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     (
    0.70
    (
    0.68
     that
    0.56
    {
    0.54
    that
    0.50
    :<
    0.47
     ():
    0.47
    :
    0.46
     ((
    0.46
     a
    0.45
    POSITIVE LOGITS
    ي
    0.81
    i
    0.80
    0.67
    ி
    0.61
    و
    0.58
    0.57
    in
    0.56
    י
    0.56
    0.54
    ി
    0.53
    Act Density 0.396%

    No Known Activations