INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ä
    1.16
     to
    0.97
    ка
    0.92
    ки
    0.89
    га
    0.83
    க்கு
    0.81
     by
    0.80
    ך
    0.80
    ня
    0.79
    ని
    0.78
    POSITIVE LOGITS
    al
    1.79
    m
    1.09
    the
    1.05
    1.04
    ap
    1.03
    r
    1.02
    u
    1.00
    ad
    0.93
    J
    0.93
    p
    0.92
    Act Density 0.011%

    No Known Activations