INDEX
    Explanations

    words related to negative or controversial actions or situations

    terms related to reputation and its implications

    New Auto-Interp
    Negative Logits
    sky
    -0.65
    nova
    -0.65
     Nig
    -0.63
    thur
    -0.60
     veins
    -0.60
    danger
    -0.60
    gray
    -0.60
    erb
    -0.59
     Mig
    -0.59
     Sherman
    -0.59
    POSITIVE LOGITS
    ction
    0.87
    enment
    0.80
    ndum
    0.79
    ctions
    0.78
    ance
    0.76
    eval
    0.76
    issance
    0.76
    atives
    0.75
    ENCY
    0.73
    essed
    0.73
    Act Density 0.164%

    No Known Activations