INDEX
    Explanations

    repetitive phrases that emphasize the word "all."

    New Auto-Interp
    Negative Logits
    resi
    -0.15
    ÙĨب
    -0.15
    ãģķãĤĵãģ®
    -0.15
    Ùħج
    -0.15
    untu
    -0.15
    asin
    -0.14
    afb
    -0.14
    anzi
    -0.14
    VML
    -0.14
    stants
    -0.14
    POSITIVE LOGITS
     way
    0.30
     they
    0.25
    thew
    0.24
     away
    0.22
    way
    0.20
    away
    0.20
    anter
    0.19
     the
    0.18
    ll
    0.18
    they
    0.18
    Act Density 0.013%

    No Known Activations