INDEX
    Explanations

    pronouns and their usage in sentences

    New Auto-Interp
    Head Attr Weights
    0:0.10
    1:0.03
    2:0.03
    3:0.06
    4:0.07
    5:0.04
    6:0.05
    7:0.02
    8:0.22
    9:0.25
    10:0.03
    11:0.03
    Negative Logits
     Antar
    -1.81
     Rutherford
    -1.74
    rily
    -1.74
    Mars
    -1.72
     Hotel
    -1.70
     Schwar
    -1.67
     hotel
    -1.67
    anwhile
    -1.65
     Balloon
    -1.63
     Patron
    -1.63
    POSITIVE LOGITS
    respect
    2.04
    quest
    1.94
    ribution
    1.90
    politics
    1.85
    izations
    1.84
    erest
    1.84
     considerations
    1.83
    ogie
    1.82
    phies
    1.81
     ado
    1.78
    Act Density 0.001%

    No Known Activations