INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.07
    2:0.09
    3:0.08
    4:0.09
    5:0.08
    6:0.09
    7:0.08
    8:0.08
    9:0.07
    10:0.09
    11:0.07
    Negative Logits
    gradation
    -1.84
    usable
    -1.69
    Reviewer
    -1.65
    ingu
    -1.63
    ensibly
    -1.59
    onomic
    -1.55
    ongyang
    -1.52
    formance
    -1.52
     possession
    -1.51
    ioned
    -1.50
    POSITIVE LOGITS
     Nap
    1.61
    icz
    1.56
     Mig
    1.55
    eni
    1.52
    talking
    1.52
     SPD
    1.50
     Manz
    1.50
    mosp
    1.50
     laughs
    1.50
     Humph
    1.45
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.