INDEX
    Explanations

    phrases related to being warned, surprised, challenged, or advised by others

    references to "us" or collective experiences and actions

    New Auto-Interp
    Negative Logits
    fect
    -0.71
    tein
    -0.70
    lets
    -0.65
     CPC
    -0.63
     livest
    -0.62
    ussen
    -0.60
    ye
    -0.59
     chaired
    -0.58
    served
    -0.58
    stick
    -0.57
    POSITIVE LOGITS
    selves
    1.17
    hers
    1.10
    ern
    0.93
    aning
    0.90
     ourselves
    0.84
    leep
    0.82
     selves
    0.80
    urious
    0.79
    ury
    0.78
     eleph
    0.78
    Act Density 0.066%

    No Known Activations