INDEX
    Explanations

    negations or denials paired with adjectives

    phrases emphasizing absence or lack of something

    New Auto-Interp
    Negative Logits
    orks
    -0.80
    flies
    -0.74
    yx
    -0.68
    fs
    -0.68
    UME
    -0.67
    rib
    -0.66
    olds
    -0.66
    haul
    -0.66
    die
    -0.64
     Chains
    -0.64
    POSITIVE LOGITS
     else
    0.79
     hidden
    0.77
     buried
    0.72
     intrinsic
    0.70
     objectionable
    0.70
     overlap
    0.70
     happening
    0.69
     poetic
    0.68
     lurking
    0.68
     shameful
    0.67
    Act Density 0.044%

    No Known Activations