INDEX
    Explanations

    sentences containing negations, particularly focusing on negations with high confidence

    negations and phrases indicating something is not true or does not exist

    New Auto-Interp
    Negative Logits
    arta
    -0.87
    =-=-=-=-=-=-=-=-
    -0.77
     VIDEOS
    -0.69
    Dialogue
    -0.68
     newsletters
    -0.68
    anon
    -0.66
    mare
    -0.66
    ologies
    -0.66
     NOW
    -0.63
     certs
    -0.63
    POSITIVE LOGITS
     conceived
    1.04
     unsuccessful
    1.04
     originally
    1.02
     instrumental
    0.99
     initially
    0.98
     born
    0.93
     successful
    0.91
     intended
    0.87
     able
    0.86
     supposed
    0.83
    Act Density 0.283%

    No Known Activations