INDEX
    Explanations

    pronouns referring to a specific male individual

    references to a specific subject, indicating a focus on a singular male character throughout the text

    New Auto-Interp
    Negative Logits
     Peak
    -0.69
     Killing
    -0.67
    reshold
    -0.65
    earch
    -0.63
    keleton
    -0.60
    peak
    -0.59
    Interest
    -0.59
     Girls
    -0.59
    htaking
    -0.59
    Temperature
    -0.58
    POSITIVE LOGITS
    'd
    1.28
    'll
    1.20
     wrote
    1.01
    zbollah
    0.95
    eded
    0.93
     tweeted
    0.89
    resy
    0.87
     ported
    0.86
    've
    0.85
    pherd
    0.84
    Act Density 0.254%

    No Known Activations