INDEX
    Explanations

    positive expressions and sentiments

    expressions of admiration and appreciation

    New Auto-Interp
    Negative Logits
    erenn
    -0.80
    soever
    -0.73
    uria
    -0.65
    bable
    -0.64
    agues
    -0.63
    Else
    -0.61
    CENT
    -0.60
    istance
    -0.59
     operative
    -0.59
     predicate
    -0.59
    POSITIVE LOGITS
     how
    1.85
    how
    1.38
     HOW
    1.12
     why
    1.02
     How
    0.98
     what
    0.90
    HOW
    0.81
    How
    0.76
     seeing
    0.75
     whether
    0.75
    Act Density 0.467%

    No Known Activations