INDEX
    Explanations

    negations and expressions of contradiction

    New Auto-Interp
    Negative Logits
     be
    -0.40
     Be
    -0.32
    (be
    -0.29
    be
    -0.29
    Be
    -0.27
    .Be
    -0.23
    (Be
    -0.21
    -be
    -0.21
    /be
    -0.20
    	be
    -0.20
    POSITIVE LOGITS
     need
    0.24
     seem
    0.21
    need
    0.19
     belong
    0.19
    NotExist
    0.18
     deserve
    0.18
     Need
    0.18
     tend
    0.17
     have
    0.16
     care
    0.16
    Act Density 0.214%

    No Known Activations