INDEX
    Explanations

    references to personal identity and relationships

    New Auto-Interp
    Negative Logits
     itself
    -0.18
    Indented
    -0.15
    bor
    -0.15
    ema
    -0.14
    urma
    -0.14
    /effects
    -0.14
     himself
    -0.14
    Ïİν
    -0.14
    arlo
    -0.14
    rale
    -0.14
    POSITIVE LOGITS
     differently
    0.17
     face
    0.17
     again
    0.16
     perform
    0.16
     doing
    0.15
     coming
    0.15
     DJ
    0.15
    /her
    0.14
    pard
    0.14
     smile
    0.14
    Act Density 0.049%

    No Known Activations