INDEX
    Explanations

    emphasizing words that enhance or reinforce personal feelings or opinions

    New Auto-Interp
    Negative Logits
    's
    -0.36
    ’s
    -0.30
    'S
    -0.27
     be
    -0.27
    ´s
    -0.25
     himself
    -0.23
    ’S
    -0.22
     themselves
    -0.22
     herself
    -0.21
    `s
    -0.21
    POSITIVE LOGITS
     can
    0.34
     cannot
    0.34
     need
    0.32
     aren
    0.30
     are
    0.26
     don
    0.26
     want
    0.25
    need
    0.24
     must
    0.24
     haven
    0.24
    Act Density 0.105%

    No Known Activations