INDEX
    Explanations

    phrases indicating movement toward a progressive or positive direction

    New Auto-Interp
    Head Attr Weights
    0:0.16
    1:0.02
    2:0.06
    3:0.17
    4:0.02
    5:0.05
    6:0.02
    7:0.05
    8:0.02
    9:0.01
    10:0.36
    11:0.02
    Negative Logits
     fame
    -2.31
     televised
    -2.06
    ��
    -2.02
     expire
    -1.97
     notoriety
    -1.95
     cumbers
    -1.93
     Serving
    -1.87
     Ability
    -1.84
     Recorded
    -1.82
     residing
    -1.81
    POSITIVE LOGITS
     direction
    3.66
    wrong
    3.35
     directions
    3.31
    oward
    3.16
     wrong
    2.82
    correct
    2.75
     opposite
    2.70
     towards
    2.55
     Wrong
    2.51
    wise
    2.49
    Act Density 0.033%

    No Known Activations