INDEX
    Explanations

    phrases indicating independence or self-sufficiency

    New Auto-Interp
    Negative Logits
    Own
    -0.18
     own
    -0.18
     theirs
    -0.18
     Own
    -0.17
    own
    -0.17
     Yours
    -0.17
    OWN
    -0.16
     yours
    -0.16
    longleftrightarrow
    -0.15
     poc
    -0.14
    POSITIVE LOGITS
    enor
    0.15
    elsing
    0.15
    ocket
    0.14
    ubb
    0.14
    orthand
    0.14
    chas
    0.14
    endid
    0.13
    lys
    0.13
    osa
    0.13
    cha
    0.13
    Act Density 0.045%

    No Known Activations