INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     I
    0.94
    ן
    0.78
    ுக
    0.73
    IFI
    0.70
    ని
    0.69
    ت
    0.63
    IX
    0.63
    др
    0.62
    ره
    0.61
    들이
    0.61
    POSITIVE LOGITS
    en
    1.15
    er
    0.98
    ad
    0.93
    as
    0.89
     proviso
    0.85
    u
    0.84
    on
    0.82
    ために
    0.81
    ak
    0.79
    த்தில்
    0.79
    Act Density 0.006%

    No Known Activations