INDEX
    Explanations

    references to societal norms and trends

    New Auto-Interp
    Negative Logits
     two
    -0.24
     entirety
    -0.21
     three
    -0.19
     possibility
    -0.19
     chance
    -0.19
     entire
    -0.18
     presence
    -0.17
     zwei
    -0.17
    two
    -0.17
     slightest
    -0.16
    POSITIVE LOGITS
     early
    0.19
     stuff
    0.17
     newer
    0.17
    etter
    0.17
     recent
    0.17
     earlier
    0.16
    ones
    0.16
    htar
    0.15
     things
    0.15
     later
    0.15
    Act Density 0.094%

    No Known Activations