INDEX
    Explanations

    emotionally charged and impactful words or phrases

    abstract concepts related to morality, conflict, and the human experience

    New Auto-Interp
    Negative Logits
    CHAT
    -0.78
    ificant
    -0.74
    ificantly
    -0.66
    zl
    -0.62
    APD
    -0.62
    azeera
    -0.61
    eatures
    -0.60
    ittees
    -0.59
    ersen
    -0.59
    oops
    -0.58
    POSITIVE LOGITS
    ankind
    0.85
    lessness
    0.78
     emanating
    0.68
    thood
    0.67
    fulness
    0.66
     fame
    0.65
     itself
    0.65
    nesia
    0.65
    beard
    0.64
    ropy
    0.63
    Act Density 0.466%

    No Known Activations