INDEX
    Explanations

    strong negative emotions or hostility, particularly related to hatred

    references to hatred and its various expressions and implications

    New Auto-Interp
    Negative Logits
    UNCH
    -0.83
     helicop
    -0.82
    ODE
    -0.80
    umm
    -0.73
    USE
    -0.69
    å¸
    -0.67
    aqu
    -0.66
    glas
    -0.64
    change
    -0.61
    AMA
    -0.61
    POSITIVE LOGITS
     hatred
    0.95
     towards
    0.90
     prejudice
    0.83
    yip
    0.83
     toward
    0.82
     vengeance
    0.82
    ãĥĨ
    0.78
    lessly
    0.78
    wart
    0.77
     rage
    0.74
    Act Density 0.029%

    No Known Activations