INDEX
    Explanations

    phrases related to societal issues or criticisms

    New Auto-Interp
    Negative Logits
    CLASSIFIED
    -0.73
    mask
    -0.64
    TIME
    -0.64
    ALSE
    -0.63
     %%
    -0.61
    PLAY
    -0.61
     lasted
    -0.60
    ATURES
    -0.60
    #$
    -0.60
    FIG
    -0.60
    POSITIVE LOGITS
     favor
    1.53
     favour
    1.34
     lieu
    1.29
     order
    1.19
    efficiency
    1.15
     spite
    1.15
     vitro
    1.11
     accordance
    1.11
    effic
    1.11
     anticipation
    1.03
    Act Density 0.336%

    No Known Activations