INDEX
    Explanations

    repeated phrases indicating inclusivity or universality

    New Auto-Interp
    Negative Logits
    eworthy
    -0.17
    ulate
    -0.17
    midi
    -0.17
    side
    -0.15
    785
    -0.15
    ãi
    -0.15
    ioned
    -0.14
    atcher
    -0.14
    nel
    -0.14
    elines
    -0.14
    POSITIVE LOGITS
    ones
    0.21
    THING
    0.19
    /all
    0.18
    hone
    0.18
    thin
    0.17
    though
    0.17
    where
    0.16
    ong
    0.16
    theless
    0.16
    -other
    0.16
    Act Density 0.085%

    No Known Activations