INDEX
    Explanations

    occurrences of the letter 'w' in various contexts

    New Auto-Interp
    Negative Logits
    andest
    -0.15
    jom
    -0.15
    wort
    -0.15
    quee
    -0.15
    arb
    -0.14
    ition
    -0.14
     Düz
    -0.14
     nomine
    -0.14
    acen
    -0.13
    Lane
    -0.13
    POSITIVE LOGITS
     w
    0.23
    illo
    0.15
     gre
    0.15
    [w
    0.15
    gle
    0.15
    ingly
    0.14
    rans
    0.14
    ubern
    0.14
    ering
    0.14
     bell
    0.14
    Act Density 0.022%

    No Known Activations