INDEX
    Explanations

    contractions and informal language

    references to collective experiences and social dynamics

    New Auto-Interp
    Negative Logits
     Emerson
    -0.77
     Binding
    -0.70
     Lyons
    -0.69
     Seymour
    -0.69
     Bender
    -0.67
     Alpine
    -0.66
    pires
    -0.64
     Keynes
    -0.63
     Salmon
    -0.63
     Swansea
    -0.63
    POSITIVE LOGITS
     don
    1.19
    didn
    1.18
    doesn
    1.15
     aren
    1.13
     didn
    1.12
     shouldn
    1.12
     ain
    1.12
     wouldn
    1.10
     DON
    1.08
    don
    1.01
    Act Density 0.273%

    No Known Activations