INDEX
    Explanations

    and/or followed by auxiliary

    New Auto-Interp
    Negative Logits
    ون
    0.57
    ه
    0.57
    g
    0.56
    ன்
    0.55
    v
    0.52
    d
    0.50
    a
    0.50
     Kabhi
    0.49
    0.49
    დი
    0.48
    POSITIVE LOGITS
    ы
    0.68
    U
    0.55
    ó
    0.54
     it
    0.52
     to
    0.52
    ?
    0.52
     are
    0.51
     esports
    0.50
    َ
    0.50
     y
    0.49
    Act Density 0.574%

    No Known Activations