INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.10
    1:0.08
    2:0.12
    3:0.06
    4:0.04
    5:0.06
    6:0.06
    7:0.03
    8:0.07
    9:0.05
    10:0.17
    11:0.11
    Negative Logits
    ��
    -2.56
     newcom
    -2.14
     Funny
    -2.08
     Newsp
    -2.06
     distur
    -2.05
     censorship
    -1.97
     stuffing
    -1.90
    Latest
    -1.90
     Flavoring
    -1.84
     Advertisement
    -1.84
    POSITIVE LOGITS
    k
    2.63
    xi
    2.39
    lo
    2.35
    ks
    2.34
    xs
    2.31
    lb
    2.28
    mx
    2.23
    kb
    2.23
    gm
    2.21
    kt
    2.21
    Act Density 0.000%

    No Known Activations