INDEX
    Explanations

    instances of the letter "W" in various contexts

    New Auto-Interp
    Negative Logits
    idget
    -0.19
    allet
    -0.17
    arrow
    -0.17
    eb
    -0.17
    ave
    -0.16
    heel
    -0.15
    iki
    -0.15
    alls
    -0.15
    ork
    -0.15
    uges
    -0.15
    POSITIVE LOGITS
    enh
    0.15
    WISE
    0.15
    анÑĮ
    0.14
    ictor
    0.14
    orgen
    0.13
     ho
    0.13
    iert
    0.13
     simul
    0.13
    anel
    0.13
     pros
    0.13
    Act Density 0.039%

    No Known Activations