INDEX
    Explanations

    phrases related to social and political actions or statements

    repetitive phrases or expressions highlighting quantifiers and negations

    New Auto-Interp
    Negative Logits
    Redditor
    -0.79
     Also
    -0.76
     Alternatively
    -0.70
    MAN
    -0.67
     Additionally
    -0.66
    also
    -0.66
     additionally
    -0.65
     Nare
    -0.64
    Also
    -0.63
    rm
    -0.63
    POSITIVE LOGITS
    whatever
    0.93
    etc
    0.86
     EntityItem
    0.75
     clot
    0.68
     etc
    0.66
    cknow
    0.66
     decency
    0.65
     blah
    0.62
     sensit
    0.62
    dq
    0.61
    Act Density 0.260%

    No Known Activations