INDEX
    Explanations

    phrases indicating approval or acknowledgment

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.06
    2:0.13
    3:0.04
    4:0.02
    5:0.03
    6:0.05
    7:0.11
    8:0.32
    9:0.04
    10:0.06
    11:0.07
    Negative Logits
    ials
    -1.22
    arrow
    -1.17
    zie
    -1.16
    yond
    -1.16
    arters
    -1.14
    rongh
    -1.13
     increments
    -1.11
    dding
    -1.10
    nings
    -1.10
    gaard
    -1.10
    POSITIVE LOGITS
    ently
    1.28
    pires
    1.22
     belonged
    1.17
    ITED
    1.16
    edly
    1.10
    Offic
    1.09
     loudly
    1.08
    pired
    1.06
    ּ
    1.05
     passionately
    1.03
    Act Density 0.110%

    No Known Activations