INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.05
    1:0.05
    2:0.08
    3:0.11
    4:0.08
    5:0.06
    6:0.07
    7:0.06
    8:0.06
    9:0.08
    10:0.09
    11:0.17
    Negative Logits
     decomp
    -1.63
     tut
    -1.62
     compiled
    -1.60
     fatig
    -1.58
     emb
    -1.49
     Deng
    -1.48
     mater
    -1.47
     therapists
    -1.47
    ographies
    -1.45
     antiqu
    -1.45
    POSITIVE LOGITS
    antha
    2.17
    osher
    1.91
    retty
    1.81
    zee
    1.81
    ーティ
    1.79
    nom
    1.72
    WARD
    1.71
    udeau
    1.67
    deal
    1.65
    sth
    1.64
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.