INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    '
    1.54
    其他
    1.08
     was
    1.04
    كان
    1.01
    "
    1.00
    0.98
    కు
    0.98
    0.97
    ۹
    0.96
     de
    0.94
    POSITIVE LOGITS
    ur
    1.33
    in
    1.18
    ле
    1.07
    ul
    1.05
    ம்
    0.91
    ன்ஸ்
    0.91
    ad
    0.89
    0.86
    ys
    0.86
    ன்
    0.84
    Act Density 0.008%

    No Known Activations