INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.05
    2:0.04
    3:0.08
    4:0.15
    5:0.06
    6:0.04
    7:0.07
    8:0.03
    9:0.03
    10:0.10
    11:0.24
    Negative Logits
     flush
    -2.46
     circled
    -2.35
     eleph
    -2.35
     flared
    -2.28
     tremend
    -2.09
    `.
    -2.09
     oun
    -2.08
     ``
    -2.08
     flushed
    -2.06
     hur
    -2.04
    POSITIVE LOGITS
    ?,
    3.15
    /,
    2.95
    chens
    2.77
    2.63
    ®,
    2.41
    @
    2.31
    ,[
    2.27
    .,
    2.27
    %,
    2.27
    ","
    2.25
    Act Density 0.009%

    No Known Activations