INDEX
    Explanations

    negation or denial statements

    New Auto-Interp
    Negative Logits
    .appspot
    -0.16
    .prot
    -0.15
    gree
    -0.15
    _keeper
    -0.15
    YC
    -0.15
    iqueta
    -0.14
    lemn
    -0.14
    irie
    -0.14
    lero
    -0.14
    ntl
    -0.14
    POSITIVE LOGITS
    ono
    0.15
    ore
    0.15
    zz
    0.15
    sp
    0.14
     gender
    0.14
    ody
    0.14
    ove
    0.14
    oks
    0.14
    ella
    0.14
     further
    0.13
    Act Density 0.065%

    No Known Activations