INDEX
    Explanations

    negations related to various subjects

    New Auto-Interp
    Negative Logits
    accompan
    -0.69
    çĶŁ
    -0.68
    planet
    -0.68
    TERN
    -0.66
    è¦ļéĨĴ
    -0.65
    complete
    -0.63
    Reviewer
    -0.63
    forms
    -0.62
    larg
    -0.62
     Britann
    -0.61
    POSITIVE LOGITS
     necessarily
    1.19
     exactly
    1.06
     bother
    0.94
     quite
    0.92
     really
    0.91
     gotta
    0.85
     gonna
    0.83
    epad
    0.82
     bluff
    0.81
     even
    0.81
    Act Density 0.113%

    No Known Activations