INDEX
    Explanations

    personal pronouns and possessive determiners

    New Auto-Interp
    Negative Logits
     maneu
    -1.34
     accla
    -1.32
     desir
    -1.29
     laun
    -1.28
     depic
    -1.28
     effe
    -1.28
     secon
    -1.26
     wien
    -1.25
     fuf
    -1.25
     fortn
    -1.24
    POSITIVE LOGITS
    <bos>
    0.97
     teachings
    0.64
     kindness
    0.63
     latest
    0.61
     generosity
    0.59
     guidance
    0.59
    s
    0.58
     words
    0.57
     advice
    0.56
     approval
    0.56
    Act Density 0.270%

    No Known Activations