INDEX
    Explanations

    instances of words related to expressing support or approval

    phrases indicating support for various actions or causes

    New Auto-Interp
    Negative Logits
    ashtra
    -0.71
    mAh
    -0.70
     fing
    -0.70
    Tracker
    -0.68
    Vers
    -0.66
    uder
    -0.66
    mie
    -0.66
    hole
    -0.61
    orbit
    -0.61
    naire
    -0.60
    POSITIVE LOGITS
    gotten
    0.73
    ints
    0.70
     embattled
    0.69
     supporting
    0.66
    enance
    0.65
     vested
    0.65
    icans
    0.65
     unsupported
    0.65
     whichever
    0.63
     marginalized
    0.62
    Act Density 0.119%

    No Known Activations