INDEX
    Explanations

    concepts related to self-awareness and knowledge

    New Auto-Interp
    Negative Logits
     inval
    -0.16
    ære
    -0.15
    azu
    -0.15
    aub
    -0.15
    izr
    -0.14
    orce
    -0.14
    utura
    -0.14
    fir
    -0.14
    wear
    -0.14
     withd
    -0.14
    POSITIVE LOGITS
     about
    0.20
    _about
    0.16
    aton
    0.15
     Thur
    0.15
     understanding
    0.15
    aman
    0.14
     eye
    0.14
    pire
    0.14
    ollen
    0.14
     is
    0.14
    Act Density 0.202%

    No Known Activations