INDEX
    Explanations

    phrases related to negative impacts or setbacks

    phrases indicating negative impacts or consequences

    New Auto-Interp
    Negative Logits
    ript
    -0.81
    iosity
    -0.68
    RY
    -0.67
    uana
    -0.67
    cius
    -0.67
     Malays
    -0.67
    ively
    -0.65
    phis
    -0.63
    âĸ¬
    -0.62
    ately
    -0.62
    POSITIVE LOGITS
    hole
    1.04
    gun
    0.93
    holes
    0.92
    job
    0.90
    hard
    0.90
    pipe
    0.88
    blow
    0.88
    jobs
    0.88
    outs
    0.87
    waves
    0.86
    Act Density 0.021%

    No Known Activations