INDEX
    Explanations

    phrases that imply manipulation or diversion of attention

    New Auto-Interp
    Negative Logits
    ntag
    -0.16
    enco
    -0.16
    icio
    -0.16
     upstream
    -0.15
    dater
    -0.15
    InterfaceOrientation
    -0.15
    odega
    -0.15
    bish
    -0.15
    uts
    -0.14
     HOLDERS
    -0.14
    POSITIVE LOGITS
     away
    0.36
     attention
    0.33
     toward
    0.31
     Away
    0.30
     towards
    0.29
    attention
    0.29
     Attention
    0.28
     divert
    0.27
     diverted
    0.27
     onto
    0.25
    Act Density 0.060%

    No Known Activations