INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Head Attr Weights
    0:0.09
    1:0.05
    2:0.08
    3:0.08
    4:0.10
    5:0.10
    6:0.07
    7:0.07
    8:0.07
    9:0.07
    10:0.08
    11:0.07
    Negative Logits
     comr
    -1.71
     biscuits
    -1.49
     antioxid
    -1.48
     ¯
    -1.46
     bribes
    -1.45
     monarch
    -1.42
    *)
    -1.42
     compliments
    -1.41
    phia
    -1.41
     unden
    -1.41
    POSITIVE LOGITS
    yg
    1.66
    reshold
    1.65
    rb
    1.61
    ua
    1.59
    gger
    1.59
    iannopoulos
    1.56
    urst
    1.55
    scan
    1.51
    attr
    1.50
    bg
    1.48
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.