INDEX
    Explanations

    HO, CO, or specific foreign syllables

    New Auto-Interp
    Negative Logits
    м
    0.81
    ம்
    0.79
    م
    0.77
    ו
    0.71
    я
    0.64
     on
    0.64
    н
    0.63
    و
    0.61
    ER
    0.60
     belieb
    0.60
    POSITIVE LOGITS
    1
    0.98
    that
    0.84
    arı
    0.84
     that
    0.82
     you
    0.81
    are
    0.80
    0.80
    ty
    0.80
    flav
    0.80
     of
    0.79
    Act Density 0.075%

    No Known Activations