INDEX
    Explanations

    the letter "W" and its occurrences in various contexts

    New Auto-Interp
    Negative Logits
    allet
    -0.21
    hang
    -0.17
    ins
    -0.17
    all
    -0.17
    arrow
    -0.16
     widely
    -0.16
    atcher
    -0.16
    ie
    -0.16
    are
    -0.16
    as
    -0.15
    POSITIVE LOGITS
    etter
    0.18
    anj
    0.18
    istar
    0.18
    tower
    0.18
    bsite
    0.17
    tf
    0.17
    atan
    0.17
    è
    0.16
    yr
    0.16
    roc
    0.16
    Act Density 0.088%

    No Known Activations