INDEX
    Explanations

    phrases indicating political leaning or directionality

    New Auto-Interp
    Head Attr Weights
    0:0.01
    1:0.02
    2:0.05
    3:0.07
    4:0.11
    5:0.02
    6:0.15
    7:0.34
    8:0.02
    9:0.03
    10:0.07
    11:0.06
    Negative Logits
    COMPLE
    -2.04
    lance
    -1.70
    ーティ
    -1.57
    pite
    -1.53
    ogether
    -1.52
    ruction
    -1.50
    birth
    -1.47
    -1.46
    -1.46
    д
    -1.43
    POSITIVE LOGITS
     shoulders
    1.73
     plun
    1.71
     intuition
    1.66
     shaky
    1.64
     bandwagon
    1.64
     volunt
    1.63
     favorites
    1.53
     toward
    1.51
     leans
    1.49
     directional
    1.48
    Act Density 0.003%

    No Known Activations