INDEX
    Explanations

    negative implications or critiques regarding social and political issues

    New Auto-Interp
    Negative Logits
     ende
    -0.75
     scattering
    -0.72
     myster
    -0.69
    seless
    -0.68
     seiz
    -0.65
     federation
    -0.64
     proport
    -0.64
     encount
    -0.63
     obser
    -0.63
     notor
    -0.63
    POSITIVE LOGITS
    ï¸ı
    0.96
    ¯
    0.91
    #$
    0.79
    °
    0.79
    Tea
    0.73
    ef
    0.72
    âĢł
    0.72
    dj
    0.70
    cue
    0.68
    hips
    0.68
    Act Density 0.120%

    No Known Activations