INDEX
    Explanations

    the letter 'w' in various contexts

    New Auto-Interp
    Negative Logits
    eki
    -0.17
    iversal
    -0.17
    ież
    -0.17
    ibold
    -0.16
    Coder
    -0.16
    utorial
    -0.16
    hci
    -0.16
    isters
    -0.15
    panies
    -0.14
    likler
    -0.14
    POSITIVE LOGITS
    nik
    0.16
    aza
    0.16
    edException
    0.14
    rag
    0.14
    aver
    0.14
    MAN
    0.13
    лиз
    0.13
    ÛĮÙĩ
    0.13
    agt
    0.13
     naked
    0.13
    Act Density 0.020%

    No Known Activations