INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.11
    1:0.19
    2:0.03
    3:0.03
    4:0.02
    5:0.23
    6:0.07
    7:0.02
    8:0.07
    9:0.04
    10:0.07
    11:0.09
    Negative Logits
     satur
    -1.75
     smokes
    -1.62
     nic
    -1.61
     ABS
    -1.56
     IC
    -1.51
     Stan
    -1.50
     wast
    -1.50
     GI
    -1.50
    het
    -1.49
     matt
    -1.49
    POSITIVE LOGITS
    iannopoulos
    2.21
    yssey
    2.02
    cffffcc
    1.86
    aeus
    1.85
    yrinth
    1.83
    18
    1.81
    3
    1.71
    1
    1.71
    16
    1.71
    2
    1.69
    Act Density 0.008%

    No Known Activations