INDEX
    Explanations

    negations or phrases indicating a lack of something

    New Auto-Interp
    Negative Logits
    inand
    -0.17
    aka
    -0.15
     Unsupported
    -0.15
    lew
    -0.15
    undi
    -0.14
     determination
    -0.13
    iew
    -0.13
    enko
    -0.13
    anka
    -0.13
    aston
    -0.13
    POSITIVE LOGITS
     sure
    0.28
     anymore
    0.27
     necessarily
    0.27
     bud
    0.26
     phased
    0.26
     allowed
    0.25
     anywhere
    0.25
     exactly
    0.24
     interested
    0.24
     bothered
    0.23
    Act Density 0.122%

    No Known Activations