INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ي
    0.77
    _
    0.70
    ست
    0.63
    2
    0.63
    9
    0.63
    р
    0.61
    i
    0.61
    6
    0.61
    8
    0.59
    يلا
    0.59
    POSITIVE LOGITS
    speople
    0.68
    ,
    0.59
     the
    0.56
     or
    0.54
    t
    0.54
     an
    0.52
     전혀
    0.50
    daki
    0.49
    dw
    0.49
    nsan
    0.49
    Act Density 0.007%

    No Known Activations