INDEX
    Explanations

    concepts related to identity and self-expression

    New Auto-Interp
    Negative Logits
    nen
    -0.18
    wert
    -0.17
    ampion
    -0.15
    nesia
    -0.14
    acht
    -0.14
    lernen
    -0.14
    esterday
    -0.14
    ipop
    -0.13
     fold
    -0.13
    iment
    -0.13
    POSITIVE LOGITS
     personal
    0.25
    personal
    0.21
     Personal
    0.20
    Personal
    0.20
     self
    0.18
    åĢĭ人
    0.17
     pesso
    0.17
     лиÑĩ
    0.17
    _self
    0.17
     Self
    0.17
    Act Density 0.297%

    No Known Activations