INDEX
    Explanations

    words that refer to specific letters in the alphabet, especially 'G' and variations of it

    New Auto-Interp
    Negative Logits
    tk
    -0.21
    ен
    -0.20
    ett
    -0.20
    rid
    -0.19
    ui
    -0.19
    et
    -0.17
    ettle
    -0.17
    uh
    -0.17
    ame
    -0.17
    NU
    -0.17
    POSITIVE LOGITS
    opher
    0.22
    rooms
    0.18
    rafted
    0.18
    localized
    0.18
    libc
    0.18
    nosis
    0.18
     ener
    0.18
    KD
    0.17
    azing
    0.17
    urus
    0.17
    Act Density 0.162%

    No Known Activations