INDEX
    Explanations

    words with the letter 'r' in various forms and contexts

    New Auto-Interp
    Negative Logits
    anos
    -0.20
    mos
    -0.16
    anders
    -0.16
    rod
    -0.16
    orners
    -0.15
     subst
    -0.15
    iler
    -0.15
    zen
    -0.15
    efe
    -0.15
    agrams
    -0.14
    POSITIVE LOGITS
    attach
    0.24
    ê
    0.21
    oya
    0.21
    appro
    0.20
    alent
    0.20
    ense
    0.18
    ég
    0.18
    oi
    0.17
    attr
    0.17
    iche
    0.17
    Act Density 0.007%

    No Known Activations