INDEX
    Explanations

    phrases related to evidence or proof

    negative phrases related to validity or evidence

    New Auto-Interp
    Negative Logits
     Klux
    -0.79
     Dos
    -0.78
     Norris
    -0.67
     AVG
    -0.63
     Typhoon
    -0.58
     Dame
    -0.58
    iper
    -0.58
     subpoen
    -0.57
     Kru
    -0.57
     sts
    -0.57
    POSITIVE LOGITS
    based
    1.16
    driven
    0.99
    laden
    0.98
    oriented
    0.98
    bearing
    0.95
    matter
    0.94
    seeking
    0.93
    of
    0.92
    heavy
    0.92
    rich
    0.91
    Act Density 0.078%

    No Known Activations