INDEX
    Explanations

    phrases that express varying degrees of comparison or judgment

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.02
    2:0.16
    3:0.09
    4:0.28
    5:0.02
    6:0.06
    7:0.10
    8:0.04
    9:0.04
    10:0.06
    11:0.06
    Negative Logits
     Bei
    -1.61
    taboola
    -1.52
     Cosponsors
    -1.46
    20439
    -1.43
    -1.38
    fam
    -1.36
     Participant
    -1.36
    OURCE
    -1.35
    Story
    -1.34
    ettings
    -1.34
    POSITIVE LOGITS
     squared
    1.66
     sizing
    1.60
     messing
    1.59
     ner
    1.55
     wrinkles
    1.50
     sloppy
    1.44
     bells
    1.42
     hormones
    1.39
     joking
    1.38
     tweaking
    1.37
    Act Density 0.113%

    No Known Activations