INDEX
    Explanations

    phrases related to actions and events

    phrases indicating evaluation or response to events

    New Auto-Interp
    Negative Logits
     THAT
    -0.66
     incent
    -0.60
     folks
    -0.60
     THESE
    -0.57
    ixties
    -0.56
     THIS
    -0.56
    inese
    -0.56
    ividual
    -0.56
    inian
    -0.55
    Those
    -0.55
    POSITIVE LOGITS
     it
    0.85
     its
    0.79
     Its
    0.65
    Its
    0.64
     theirs
    0.62
    llah
    0.60
    its
    0.59
    uve
    0.59
    ADRA
    0.57
     Paddock
    0.57
    Act Density 1.010%

    No Known Activations