INDEX
    Explanations

    first-person pronouns and references to personal experiences

    New Auto-Interp
    Negative Logits
     themselves
    -0.27
     their
    -0.20
    Their
    -0.20
     Their
    -0.19
    their
    -0.18
     leurs
    -0.17
    holm
    -0.17
    reuse
    -0.16
    ij
    -0.15
    osi
    -0.15
    POSITIVE LOGITS
     alike
    0.17
    ago
    0.16
    AO
    0.15
    onor
    0.14
    iena
    0.14
    edy
    0.14
    ags
    0.14
    zelf
    0.14
    eren
    0.14
    /plugin
    0.13
    Act Density 0.087%

    No Known Activations