INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.09
    1:0.06
    2:0.08
    3:0.08
    4:0.08
    5:0.07
    6:0.07
    7:0.07
    8:0.10
    9:0.08
    10:0.08
    11:0.08
    Negative Logits
    ��
    -1.99
    Numbers
    -1.73
    ��
    -1.72
    -1.72
    ��
    -1.64
    oultry
    -1.61
    arers
    -1.57
     Cathy
    -1.53
    ��
    -1.52
    aughters
    -1.52
    POSITIVE LOGITS
     spoiler
    1.93
    center
    1.62
    pedia
    1.60
    fram
    1.57
    walker
    1.57
     sidx
    1.56
    wiki
    1.56
    oiler
    1.55
    pend
    1.50
    sonian
    1.48
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.