INDEX
    Explanations

    the word "de" appearing with varying activation values, potentially indicating a specific keyword or concept

    instances of the word "de."

    New Auto-Interp
    Negative Logits
    allery
    -0.84
    iggins
    -0.68
    sit
    -0.68
    icals
    -0.67
    annis
    -0.67
    hetti
    -0.66
    hips
    -0.66
    ieri
    -0.66
     impulse
    -0.65
    okin
    -0.65
    POSITIVE LOGITS
    ploy
    1.38
    utsche
    1.27
    cember
    1.18
    leted
    1.14
    legate
    1.13
    legates
    1.10
    bris
    1.06
    cker
    1.03
    hyde
    1.03
    ktop
    0.97
    Act Density 0.021%

    No Known Activations