INDEX
    Explanations

    phrases related to self-reflection and introspection

    New Auto-Interp
    Negative Logits
     Juf
    -1.79
     stockholm
    -1.77
     dises
    -1.69
     lidl
    -1.66
     lyon
    -1.63
     wien
    -1.61
     leonardo
    -1.61
     squa
    -1.59
     frankfurt
    -1.59
     jorge
    -1.58
    POSITIVE LOGITS
    <bos>
    1.35
     definitely
    0.79
     actually
    0.72
     really
    0.71
     my
    0.70
     pretty
    0.70
     probably
    0.69
     very
    0.69
     I
    0.69
     honestly
    0.68
    Act Density 0.338%

    No Known Activations