INDEX
    Explanations

    references to cultural concepts and identities

    New Auto-Interp
    Negative Logits
    ildo
    -0.19
    elson
    -0.19
    uten
    -0.17
    ovel
    -0.15
    odes
    -0.15
    olson
    -0.14
    omal
    -0.14
    orsk
    -0.14
    stadt
    -0.14
    abouts
    -0.14
    POSITIVE LOGITS
    ìĿ´ìĸ´
    0.15
    Shock
    0.15
     prompt
    0.15
    PEED
    0.14
    lsi
    0.14
    IMIT
    0.14
    RYPT
    0.14
    oho
    0.14
     Zug
    0.14
    ipay
    0.14
    Act Density 0.008%

    No Known Activations