INDEX
    Explanations

    references to consequences or serious outcomes related to actions

    New Auto-Interp
    Negative Logits
    Gön
    -0.66
    atown
    -0.55
    ">*</
    -0.54
     RATING
    -0.53
     Dress
    -0.53
    indale
    -0.52
    ulose
    -0.52
    TAINMENT
    -0.52
    ecutable
    -0.52
    perity
    -0.52
    POSITIVE LOGITS
     Either
    1.22
     Such
    1.20
     Then
    1.18
     Others
    1.17
     Both
    1.17
    Then
    1.16
     Again
    1.15
    Such
    1.15
     Other
    1.15
    Both
    1.14
    Act Density 2.716%

    No Known Activations