INDEX
    Explanations

    references to emotions and reactions related to thoughts and actions

    New Auto-Interp
    Negative Logits
     ourselves
    -0.71
     we
    -0.67
    让我们
    -0.66
    讓我們
    -0.59
     нами
    -0.57
     yourselves
    -0.56
     vimos
    -0.55
    weil
    -0.55
    we
    -0.54
    我们在
    -0.54
    POSITIVE LOGITS
     Slowly
    0.85
     Glan
    0.78
    Slowly
    0.70
     “
    0.69
     Carefully
    0.69
     Sigh
    0.65
     Maybe
    0.65
     Surely
    0.65
     Turning
    0.64
     Something
    0.64
    Act Density 0.113%

    No Known Activations