INDEX
    Explanations
    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.02
    2:0.04
    3:0.05
    4:0.04
    5:0.04
    6:0.40
    7:0.08
    8:0.05
    9:0.07
    10:0.06
    11:0.05
    Negative Logits
     abdom
    -1.36
     broom
    -1.29
     cess
    -1.29
    arily
    -1.23
    lessly
    -1.22
     sidelines
    -1.21
     cigars
    -1.16
    metic
    -1.15
     levers
    -1.14
     stabilization
    -1.13
    POSITIVE LOGITS
    Pac
    1.34
    ndra
    1.25
    arie
    1.25
     Williamson
    1.23
    Keefe
    1.23
    ukong
    1.23
     Goodwin
    1.22
    hetti
    1.20
    coni
    1.20
    chini
    1.20
    Act Density 0.002%

    No Known Activations