INDEX
    Explanations

    references to a particular individual's attributes or actions

    New Auto-Interp
    Negative Logits
     Monfieur
    -0.68
    נטרנט
    -0.61
     Eul
    -0.60
     Agamemnon
    -0.60
    Jegyzetek
    -0.59
    allez
    -0.59
     Allez
    -0.59
     تبد
    -0.58
     Dede
    -0.58
    ally
    -0.58
    POSITIVE LOGITS
     his
    1.68
     HIS
    1.51
     her
    1.45
    HIS
    1.43
     His
    1.42
    His
    1.30
    his
    1.28
     own
    1.25
     Her
    1.23
    Her
    1.08
    Act Density 0.147%

    No Known Activations