INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     I
    1.60
    3
    1.17
    1.13
    te
    1.03
    án
    0.95
    i
    0.95
    um
    0.94
    ?
    0.88
    that
    0.88
    u
    0.86
    POSITIVE LOGITS
    nél
    1.21
    1.16
     we
    0.94
    (
    0.94
    nione
    0.92
    =\
    0.89
    nika
    0.88
     
    0.88
    ಗಳ
    0.88
    0.86
    Act Density 0.358%

    No Known Activations