INDEX
    Explanations

    declarative statements followed by a comparison or contrast

    occurrences of the word "None."

    New Auto-Interp
    Negative Logits
     marking
    -0.65
     lif
    -0.65
     ideal
    -0.60
     guys
    -0.60
     planners
    -0.59
     hips
    -0.58
     resear
    -0.58
    agers
    -0.58
     tours
    -0.58
    rave
    -0.58
    POSITIVE LOGITS
     None
    3.55
    None
    2.54
    none
    1.92
     none
    1.79
     Nothing
    1.59
     Neither
    1.34
     NULL
    1.32
     False
    1.30
    Nothing
    1.22
     TBA
    1.18
    Act Density 0.010%

    No Known Activations