INDEX
    Explanations

    questions and statements addressing the reader or listener directly

    New Auto-Interp
    Negative Logits
    yrights
    -0.83
    ces
    -0.78
    inery
    -0.73
    edIn
    -0.72
    opens
    -0.72
    ooks
    -0.71
    ges
    -0.69
    ruciating
    -0.68
    tails
    -0.68
    uces
    -0.68
    POSITIVE LOGITS
    ?'
    1.06
    ?'"
    1.04
     ever
    1.01
    ?"
    0.98
    ?
    0.95
    ?)
    0.94
    ?:
    0.92
    ...?
    0.90
    ?!
    0.86
    ????
    0.86
    Act Density 0.513%

    No Known Activations