INDEX
    Explanations

    references to personal pronouns and their associated actions in context

    New Auto-Interp
    Negative Logits
    isay
    -0.18
    lus
    -0.16
    loo
    -0.16
    ubbo
    -0.15
    lexible
    -0.15
    dyn
    -0.15
    argon
    -0.14
    ковод
    -0.14
    imas
    -0.14
    rika
    -0.14
    POSITIVE LOGITS
    self
    0.22
     mình
    0.22
     self
    0.21
     selves
    0.21
    -self
    0.20
     itself
    0.20
    	self
    0.19
    :self
    0.18
    =self
    0.18
    elves
    0.17
    Act Density 0.036%

    No Known Activations