INDEX
    Explanations

    phrases related to personal experiences and emotions in a narrative context

    New Auto-Interp
    Negative Logits
     “
    -2.32
    -2.27
    ’”
    -2.25
     ‘
    -2.24
    ’.
    -2.22
    -2.21
    ’,
    -2.20
    .’
    -2.19
    ’)
    -2.19
    ’).
    -2.14
    POSITIVE LOGITS
     "
    1.82
    '
    1.72
    。"
    1.67
     '
    1.58
    "
    1.56
    '"
    1.45
     ("
    1.39
    ,"
    1.38
    ..."
    1.34
    :"
    1.31
    Act Density 1.433%

    No Known Activations