INDEX
    Explanations

    occurrences of the letter 'b'

    New Auto-Interp
    Negative Logits
    r
    -0.28
    u
    -0.23
    l
    -0.23
    et
    -0.23
    lk
    -0.23
    an
    -0.21
    uD
    -0.21
    j
    -0.21
    id
    -0.20
    ul
    -0.20
    POSITIVE LOGITS
    ellow
    0.20
    oston
    0.19
    oulder
    0.19
    idders
    0.19
    obby
    0.18
    ounces
    0.18
    rowning
    0.18
    rian
    0.18
    itters
    0.17
    uster
    0.17
    Act Density 0.017%

    No Known Activations