INDEX
    Explanations

    expressions of discomfort or unease

    New Auto-Interp
    Head Attr Weights
    0:0.09
    1:0.07
    2:0.09
    3:0.08
    4:0.08
    5:0.08
    6:0.06
    7:0.08
    8:0.09
    9:0.08
    10:0.07
    11:0.08
    Negative Logits
     adapting
    -2.11
     conserve
    -2.10
     jurisd
    -2.05
     technically
    -2.04
    izoph
    -2.02
     lest
    -2.02
     contend
    -1.99
     exagger
    -1.98
     stren
    -1.98
     asserting
    -1.97
    POSITIVE LOGITS
    netflix
    2.59
    raq
    2.15
    ucer
    2.08
    els
    2.07
    {\
    2.02
    rice
    1.98
    ho
    1.97
    urtle
    1.96
    yahoo
    1.96
    ghost
    1.96
    Act Density 0.000%

    No Known Activations