INDEX
    Explanations

    instances of the word "on."

    New Auto-Interp
    Negative Logits
    ei
    -0.23
    tone
    -0.22
    eing
    -0.20
    een
    -0.19
    ton
    -0.19
    ted
    -0.18
    uality
    -0.18
    ty
    -0.18
    e
    -0.18
    tual
    -0.18
    POSITIVE LOGITS
    imbus
    0.26
    ymous
    0.25
    ics
    0.25
    ese
    0.23
    nection
    0.23
    ucle
    0.23
    uevo
    0.23
    nement
    0.22
    etwork
    0.22
    ascimento
    0.22
    Act Density 0.171%

    No Known Activations