INDEX
    Explanations

    mentions of male pronouns

    references to a specific individual or pronouns related to a person

    New Auto-Interp
    Negative Logits
    history
    -0.66
     Manip
    -0.65
    Dialogue
    -0.65
     Interest
    -0.62
    change
    -0.60
     Combine
    -0.60
    Beta
    -0.60
     Gems
    -0.60
     Jugg
    -0.59
    ylon
    -0.58
    POSITIVE LOGITS
     Majesty
    1.06
    ctor
    0.87
    zbollah
    0.85
    panic
    0.83
    eded
    0.81
    uristic
    0.79
    ufact
    0.77
    bert
    0.77
    eding
    0.75
    aney
    0.73
    Act Density 0.404%

    No Known Activations