INDEX
    Explanations

    negations, references to past actions, and statements of belief or assertion

    New Auto-Interp
    Negative Logits
    emetery
    -0.77
     millenn
    -0.72
    OTAL
    -0.70
    acebook
    -0.66
    illion
    -0.65
    ategory
    -0.65
    animal
    -0.63
    ugal
    -0.63
    icultural
    -0.63
    illions
    -0.63
    POSITIVE LOGITS
     Tsarnaev
    0.69
     Sud
    0.64
     Cube
    0.63
    henko
    0.61
    rams
    0.60
     characterization
    0.60
     Chak
    0.60
    arnaev
    0.58
     herself
    0.58
     Wasserman
    0.58
    Act Density 0.800%

    No Known Activations