INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.07
    2:0.08
    3:0.08
    4:0.07
    5:0.09
    6:0.08
    7:0.08
    8:0.08
    9:0.07
    10:0.09
    11:0.08
    Negative Logits
    sein
    -2.94
    archive
    -2.93
    azes
    -2.93
    agos
    -2.86
    rosso
    -2.76
    aspers
    -2.75
    angelo
    -2.73
     -----
    -2.65
     conqu
    -2.64
    defense
    -2.61
    POSITIVE LOGITS
     Jaw
    2.97
     Clicker
    2.97
     Lamb
    2.96
     Jinn
    2.95
     Kw
    2.88
     Mulcair
    2.78
    ~~~~
    2.78
     Drone
    2.76
     NL
    2.72
     Osw
    2.67
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.