INDEX
    Explanations

    phrases expressing strong negative emotions or criticisms towards others

    expressions of regret or disgrace towards individuals or groups

    New Auto-Interp
    Negative Logits
     livest
    -0.79
    llan
    -0.75
     stabilization
    -0.75
    Downloadha
    -0.74
    nels
    -0.73
    minster
    -0.72
    perature
    -0.71
    atonin
    -0.70
    combe
    -0.69
    ernand
    -0.69
    POSITIVE LOGITS
     taxpayers
    0.84
    ãĤ®
    0.82
    ij士
    0.81
     cheated
    0.80
     victims
    0.80
     inflicted
    0.79
    da
    0.77
     humanity
    0.73
     anyone
    0.73
     shareholders
    0.72
    Act Density 0.298%

    No Known Activations