INDEX
    Explanations

    compliments and positive feedback

    mentions of personal attributes or opinions related to people

    New Auto-Interp
    Negative Logits
    actionDate
    -0.73
    kefeller
    -0.66
     Mehran
    -0.66
     srfAttach
    -0.65
     lifelong
    -0.65
    ptive
    -0.64
    eros
    -0.60
    ordinary
    -0.59
    ¬¼
    -0.58
    女
    -0.57
    POSITIVE LOGITS
     seem
    1.37
     seems
    1.21
     seemed
    1.21
     mentioned
    1.01
     clearly
    0.99
     hinted
    0.98
     evidently
    0.97
     wisely
    0.97
     sounded
    0.94
     kindly
    0.94
    Act Density 0.721%

    No Known Activations