INDEX
    Explanations

    personal pronouns specifically related to addressing the reader

    New Auto-Interp
    Negative Logits
    andon
    -0.15
    osite
    -0.14
    Fcn
    -0.14
    atto
    -0.14
    dde
    -0.14
     Bilim
    -0.14
    bjerg
    -0.14
    ilestone
    -0.14
     assorted
    -0.14
    lé
    -0.13
    POSITIVE LOGITS
     Already
    0.19
     suspect
    0.19
     already
    0.18
    457
    0.18
    Already
    0.17
     lucky
    0.17
    696
    0.16
    maal
    0.16
     plan
    0.15
     absolutely
    0.15
    Act Density 0.090%

    No Known Activations