INDEX
    Explanations

    personal pronouns indicating possession

    phrases that reference individuals and their personal attributes or actions

    New Auto-Interp
    Negative Logits
    Reviewed
    -0.80
    γ
    -0.77
    Ïī
    -0.75
    Downloadha
    -0.73
    —-
    -0.71
    models
    -0.70
    Æ
    -0.69
    reddit
    -0.69
    ÙIJ
    -0.69
    needed
    -0.69
    POSITIVE LOGITS
     wife
    1.11
     eldest
    1.10
     hobbies
    1.08
     nickname
    1.07
     father
    1.05
     biography
    1.04
     motto
    1.03
     surname
    1.02
     daughter
    1.01
     foray
    1.00
    Act Density 0.203%

    No Known Activations