INDEX
    Explanations

    quotations enclosed in double quotes

    quotation marks and speech indicators in the text

    New Auto-Interp
    Negative Logits
     adjud
    -0.81
     cram
    -0.81
     distribut
    -0.78
     sway
    -0.78
     derby
    -0.75
     dominate
    -0.75
     flared
    -0.75
     developmental
    -0.75
     schedule
    -0.74
     favor
    -0.73
    POSITIVE LOGITS
    We
    1.32
    Absolutely
    1.28
    I
    1.26
    Our
    1.24
    Whoever
    1.23
    There
    1.22
    You
    1.19
    Everything
    1.19
    Anything
    1.18
    Everyone
    1.18
    Act Density 0.079%

    No Known Activations