INDEX
    Explanations

    words related to strong negative emotions, specifically disgust and horror

    expressions of strong negative emotions, particularly disgust and horror

    New Auto-Interp
    Negative Logits
    arta
    -0.81
    ingham
    -0.75
    ieth
    -0.75
    ept
    -0.68
    etheus
    -0.67
    uin
    -0.66
    pec
    -0.65
    pler
    -0.65
    ilt
    -0.64
    arial
    -0.63
    POSITIVE LOGITS
     Zucker
    0.85
    ĸļ
    0.80
    ptin
    0.73
     disgusted
    0.70
    ingly
    0.68
     disgust
    0.68
     Viz
    0.68
    fur
    0.67
    lihood
    0.67
    ::::::::
    0.65
    Act Density 0.041%

    No Known Activations