INDEX
    Explanations

    expressions of agency and opportunity for engagement

    New Auto-Interp
    Negative Logits
    .Ui
    -0.16
    avax
    -0.16
    mania
    -0.15
    ropoda
    -0.15
    ndef
    -0.15
     oku
    -0.15
    Ĥ¨
    -0.15
    Ìī
    -0.15
    arium
    -0.15
    ngr
    -0.14
    POSITIVE LOGITS
     themselves
    0.39
    Their
    0.20
     thems
    0.20
     Their
    0.19
     their
    0.19
     flock
    0.17
     alike
    0.16
    their
    0.16
    pei
    0.16
     vers
    0.16
    Act Density 0.082%

    No Known Activations