INDEX
    Explanations

    words related to self-referential concepts and actions

    New Auto-Interp
    Negative Logits
    abwe
    -0.16
    ahoma
    -0.16
    iform
    -0.15
    akest
    -0.14
    geb
    -0.14
    argent
    -0.14
     Jensen
    -0.14
    tie
    -0.14
    idia
    -0.13
    éĪ
    -0.13
    POSITIVE LOGITS
    /self
    0.41
     Self
    0.33
    Self
    0.30
     self
    0.29
    self
    0.28
    (Self
    0.28
     SELF
    0.27
    -self
    0.26
    SELF
    0.25
    =self
    0.23
    Act Density 0.031%

    No Known Activations