INDEX
    Explanations

    descriptions or statements involving people

    pronouns and references to people in sentences

    New Auto-Interp
    Negative Logits
     �
    -0.68
     ''
    -0.63
     guiActiveUnfocused
    -0.59
     âĢº
    -0.57
     ``
    -0.56
     Tau
    -0.56
    è¦ļéĨĴ
    -0.56
     prompting
    -0.54
     exclaim
    -0.53
     β
    -0.52
    POSITIVE LOGITS
    %"
    1.02
    withstanding
    0.99
    "—
    0.97
    resa
    0.92
    itage
    0.92
    odore
    0.90
    pherd
    0.90
    "[
    0.87
    chwitz
    0.86
    ntil
    0.85
    Act Density 0.334%

    No Known Activations