INDEX
    Explanations

    intensifiers or modifiers that express exaggeration or extremes

    New Auto-Interp
    Head Attr Weights
    0:0.06
    1:0.03
    2:0.05
    3:0.16
    4:0.03
    5:0.05
    6:0.02
    7:0.06
    8:0.03
    9:0.02
    10:0.42
    11:0.03
    Negative Logits
     intention
    -2.37
     neutral
    -2.03
     intentions
    -2.02
    nor
    -2.00
     hope
    -1.93
     unaffected
    -1.93
    lication
    -1.92
     endeav
    -1.90
     anticip
    -1.88
    neutral
    -1.87
    POSITIVE LOGITS
     dstg
    2.74
     quicker
    2.40
    eeper
    2.37
     faster
    2.37
     hotter
    2.17
     MUCH
    2.10
     tighter
    2.10
     worse
    2.07
     clearer
    2.04
    yo
    2.03
    Act Density 0.018%

    No Known Activations