INDEX
    Explanations

    concepts related to individual identities and their roles

    New Auto-Interp
    Negative Logits
     one
    -0.24
    lant
    -0.19
    .ends
    -0.16
    ÑĤÑı
    -0.15
    exo
    -0.15
    icorn
    -0.14
    ERY
    -0.14
     одно
    -0.14
    atu
    -0.14
    ixa
    -0.14
    POSITIVE LOGITS
    -dimensional
    0.26
    -sided
    0.25
     liners
    0.23
    -way
    0.23
    onta
    0.22
    -third
    0.21
    -direction
    0.20
     particular
    0.20
    SELF
    0.19
    's
    0.19
    Act Density 0.117%

    No Known Activations