INDEX
    Explanations

    phrases related to identity and self-perception

    New Auto-Interp
    Negative Logits
    uros
    -0.17
    owell
    -0.15
    ignon
    -0.15
    ungan
    -0.14
    ACA
    -0.14
    gewater
    -0.14
    bourg
    -0.14
    ÅĻev
    -0.14
    rane
    -0.14
    á»±c
    -0.13
    POSITIVE LOGITS
    morph
    0.16
    eras
    0.16
    ãģ«ãģªãĤĬ
    0.15
    osoph
    0.15
    .Identity
    0.14
     Mustafa
    0.14
     identity
    0.14
    ession
    0.14
     еÑģÑĤÑĮ
    0.14
    ("(%
    0.14
    Act Density 0.126%

    No Known Activations