INDEX
    Explanations

    phrases related to support or endorsement

    instances of the word "back" or its variations related to support or defense

    New Auto-Interp
    Negative Logits
    thora
    -0.74
    itizen
    -0.69
    entric
    -0.69
    nesota
    -0.61
    lys
    -0.61
    orp
    -0.61
    ifix
    -0.60
    pox
    -0.60
    è¦ļéĨĴ
    -0.59
    cz
    -0.59
    POSITIVE LOGITS
    track
    1.25
     away
    0.95
    ped
    0.94
    tracking
    0.79
    tr
    0.77
    INTON
    0.77
     up
    0.77
    dash
    0.77
    drive
    0.74
     down
    0.73
    Act Density 0.030%

    No Known Activations