INDEX
    Explanations

    negations or refusals

    phrases emphasizing negation or the absence of something

    New Auto-Interp
    Negative Logits
    urous
    -0.64
    essen
    -0.60
     gaze
    -0.59
    tle
    -0.59
    aus
    -0.58
    irds
    -0.57
    eline
    -0.57
    heid
    -0.57
     ambition
    -0.56
    ilde
    -0.55
    POSITIVE LOGITS
     NOT
    3.37
    NOT
    2.22
     NEVER
    1.98
     ONLY
    1.79
     ALWAYS
    1.70
     ALSO
    1.69
     WITHOUT
    1.51
     THEN
    1.50
     VERY
    1.50
     REALLY
    1.46
    Act Density 0.010%

    No Known Activations