INDEX
    Explanations

    phrases related to expectations or obligations

    normative or expected actions and behaviors

    New Auto-Interp
    Negative Logits
     Finder
    -0.76
    Reviewer
    -0.72
    fortunately
    -0.69
    Reader
    -0.67
     Appears
    -0.66
    lip
    -0.64
    aroo
    -0.64
    river
    -0.63
    DAQ
    -0.62
    clips
    -0.61
    POSITIVE LOGITS
     uphold
    0.87
     behave
    0.86
     be
    0.86
     compensate
    0.84
     stick
    0.83
    ulhu
    0.80
    wered
    0.80
     abide
    0.80
     deflect
    0.80
     steer
    0.79
    Act Density 0.064%

    No Known Activations