INDEX
    Explanations

    instances of personal pronouns and identity-related language

    New Auto-Interp
    Negative Logits
    mel
    -0.16
    ander
    -0.15
    anie
    -0.14
     McInt
    -0.14
    vir
    -0.14
     Merrill
    -0.14
    ida
    -0.14
    occo
    -0.13
    bour
    -0.13
     synchron
    -0.13
    POSITIVE LOGITS
     eventually
    0.28
     Eventually
    0.26
     eventual
    0.24
    Eventually
    0.23
     Initially
    0.20
     initially
    0.19
     gradually
    0.19
    Initially
    0.19
     sooner
    0.17
    寻
    0.17
    Act Density 0.005%

    No Known Activations