INDEX
    Explanations

    terms related to self-description or self-evaluation

    New Auto-Interp
    Negative Logits
     Jensen
    -0.16
    lj
    -0.16
    _acquire
    -0.15
     fixed
    -0.15
    पन
    -0.14
    abwe
    -0.14
    agner
    -0.14
    tie
    -0.14
     hung
    -0.14
    akest
    -0.14
    POSITIVE LOGITS
    /self
    0.38
     Self
    0.29
    Self
    0.28
    (Self
    0.27
     self
    0.26
    self
    0.26
    SELF
    0.25
    -self
    0.24
     SELF
    0.24
    =self
    0.22
    Act Density 0.030%

    No Known Activations