INDEX
    Explanations

    auxiliary verb or a noun following "the"

    New Auto-Interp
    Negative Logits
    ة
    1.48
    ش
    1.20
    ف
    1.09
    Из
    1.03
    Κ
    1.02
     LikeLike
    0.98
    ت
    0.96
    Во
    0.94
    нные
    0.94
     zariaden
    0.93
    POSITIVE LOGITS
    s
    1.17
    g
    1.07
    iv
    1.05
    pagina
    1.02
    ান
    0.96
    mselves
    0.96
    ுள்ளது
    0.96
    p
    0.96
     afirma
    0.95
    िन
    0.94
    Act Density 0.261%

    No Known Activations