INDEX
    Explanations

    words related to concepts of awareness and understanding

    New Auto-Interp
    Negative Logits
    agli
    -0.15
    agem
    -0.15
    ãĥĶãĥ¼
    -0.15
     Alta
    -0.14
    yla
    -0.14
    coli
    -0.14
    mentions
    -0.14
    cket
    -0.13
    ensch
    -0.13
    paren
    -0.13
    POSITIVE LOGITS
     Ign
    0.24
    oring
    0.23
    acio
    0.23
    ition
    0.22
    orer
    0.22
    ite
    0.21
    eous
    0.20
    azio
    0.20
    ored
    0.20
    orable
    0.20
    Act Density 0.017%

    No Known Activations