INDEX
    Explanations

    references to individuals, specifically personal pronouns and related terms

    New Auto-Interp
    Negative Logits
    $MESS
    -0.17
    $LANG
    -0.16
    hausen
    -0.15
     Either
    -0.15
    nor
    -0.14
    CS
    -0.14
    streams
    -0.14
     sami
    -0.13
    IE
    -0.13
    .SDK
    -0.13
    POSITIVE LOGITS
    /her
    0.65
    /she
    0.54
    .her
    0.39
    her
    0.39
     hers
    0.34
     her
    0.31
    Her
    0.29
    /h
    0.29
    panic
    0.28
     she
    0.28
    Act Density 0.093%

    No Known Activations