INDEX
    Explanations

    mentions of the word "no" followed by various phrases

    negation or denial phrases

    New Auto-Interp
    Negative Logits
     nonetheless
    -0.68
    minster
    -0.63
    ially
    -0.63
    lus
    -0.59
     Cathy
    -0.58
     nevertheless
    -0.58
    ATIVE
    -0.57
    iership
    -0.57
    turned
    -0.56
    RED
    -0.56
    POSITIVE LOGITS
    vel
    0.99
    zzle
    0.93
    otrop
    0.86
    obs
    0.83
    xious
    0.83
     longer
    0.81
    vell
    0.81
    except
    0.80
     warranties
    0.79
    isy
    0.79
    Act Density 0.055%

    No Known Activations