INDEX
    Explanations

    occurrences of the letter 'w' and its variations in case

    New Auto-Interp
    Negative Logits
     ་་
    -0.88
    ).)
    -0.85
    ).}
    -0.83
    .";
    
    -0.79
    °;
    -0.78
    ).]
    -0.78
     *}
    -0.75
    -0.74
    ())->
    -0.72
    ')):
    -0.70
    POSITIVE LOGITS
    w
    2.14
     w
    2.13
     W
    1.03
    𝐰
    0.98
    ww
    0.97
    W
    0.96
    iw
    0.94
    mw
    0.91
     jw
    0.90
    𝙬
    0.89
    Act Density 0.098%

    No Known Activations