INDEX
    Explanations

    pronouns 'it' and 'he' in sentences

    New Auto-Interp
    Negative Logits
    hips
    -0.60
    itaire
    -0.55
     paran
    -0.51
     funer
    -0.50
    dding
    -0.49
     -----
    -0.49
     Friendly
    -0.49
    izable
    -0.49
    Priv
    -0.49
    idon
    -0.49
    POSITIVE LOGITS
    zbollah
    0.86
    self
    0.86
    unes
    0.83
    chy
    0.80
    chwitz
    0.79
    alian
    0.76
    iner
    0.76
    asca
    0.75
    anium
    0.72
     seems
    0.72
    Act Density 0.240%

    No Known Activations