INDEX
    Explanations

    now, then contrasting reality

    New Auto-Interp
    Negative Logits
    ون
    0.75
    ро
    0.70
    ان
    0.67
    و
    0.65
    0.64
    на
    0.63
    0.63
    0.63
    ம்
    0.61
    u
    0.59
    POSITIVE LOGITS
     are
    0.81
     of
    0.80
     t
    0.74
     on
    0.74
     at
    0.66
     
    0.64
     ت
    0.64
     com
    0.62
     was
    0.62
     is
    0.60
    Act Density 0.002%

    No Known Activations