INDEX
    Explanations

    the repetition of the letter 'w' in various forms

    New Auto-Interp
    Negative Logits
    empt
    -0.17
    hu
    -0.16
    hang
    -0.16
    nÃŃ
    -0.15
    lob
    -0.15
    ¯ÃĤ
    -0.15
    ع
    -0.14
    oped
    -0.14
     exc
    -0.14
    lets
    -0.14
    POSITIVE LOGITS
    irtschaft
    0.21
     hat
    0.20
    bsite
    0.20
    issenschaft
    0.20
    inder
    0.18
    anj
    0.18
    icz
    0.18
    istar
    0.18
    affle
    0.17
    avy
    0.17
    Act Density 0.162%

    No Known Activations