INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.08
    2:0.09
    3:0.08
    4:0.08
    5:0.09
    6:0.09
    7:0.06
    8:0.08
    9:0.07
    10:0.07
    11:0.08
    Negative Logits
     tongue
    -1.71
     mistaken
    -1.63
     questioning
    -1.58
     resent
    -1.57
     reconc
    -1.56
     tongues
    -1.52
     derogatory
    -1.52
     remark
    -1.51
    theless
    -1.51
     deterior
    -1.50
    POSITIVE LOGITS
    xa
    1.84
     ADA
    1.80
     EF
    1.67
    adata
    1.60
    ット
    1.60
    364
    1.58
    ATA
    1.58
    RR
    1.57
    RA
    1.53
     Agenda
    1.53
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.