INDEX
    Explanations

    interrogative phrases and questions related to feelings, actions, and moral dilemmas

    New Auto-Interp
    Head Attr Weights
    0:0.08
    1:0.04
    2:0.03
    3:0.08
    4:0.05
    5:0.09
    6:0.05
    7:0.02
    8:0.19
    9:0.29
    10:0.01
    11:0.02
    Negative Logits
    �士
    -1.90
    avour
    -1.73
    laun
    -1.72
     Beir
    -1.61
     lacked
    -1.57
    istration
    -1.56
    court
    -1.54
     Hansen
    -1.53
     Provided
    -1.52
    ラン
    -1.49
    POSITIVE LOGITS
     compare
    2.00
    ilater
    1.74
     miracle
    1.71
     fry
    1.69
     reconcile
    1.65
     Explain
    1.65
     regress
    1.64
    ancial
    1.63
     phosph
    1.62
     metaphors
    1.61
    Act Density 0.032%

    No Known Activations