INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.06
    2:0.09
    3:0.07
    4:0.08
    5:0.07
    6:0.07
    7:0.08
    8:0.08
    9:0.08
    10:0.08
    11:0.09
    Negative Logits
    ourgeois
    -2.00
    xual
    -1.87
    pless
    -1.78
     structure
    -1.75
     model
    -1.65
    akery
    -1.65
     feats
    -1.64
    arte
    -1.64
     models
    -1.63
     successor
    -1.60
    POSITIVE LOGITS
    Ver
    1.86
     Rory
    1.84
    Keith
    1.74
    essler
    1.74
     Zach
    1.72
     Reilly
    1.72
     Morgan
    1.69
     Emerson
    1.65
     stim
    1.65
     Trent
    1.63
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.