INDEX
    Explanations

    instances of the word "and."

    instances of special characters or formatting in the text

    New Auto-Interp
    Negative Logits
    bub
    -0.59
    Leaks
    -0.52
    egu
    -0.51
    hub
    -0.51
    .*
    -0.51
    seat
    -0.51
    hoe
    -0.50
     recogn
    -0.48
     foul
    -0.47
    —-
    -0.47
    POSITIVE LOGITS
    romeda
    0.99
    rogens
    0.99
    rew
    0.96
    ERSON
    0.94
    rogen
    0.87
    then
    0.71
    rost
    0.71
    alus
    0.69
    rea
    0.67
     secondly
    0.67
    Act Density 0.063%

    No Known Activations