INDEX
    Explanations

    phrases related to social issues and disparities

    New Auto-Interp
    Negative Logits
    izons
    -0.78
    agon
    -0.72
    onet
    -0.72
    oud
    -0.71
    gered
    -0.71
    ema
    -0.69
    20439
    -0.69
    iban
    -0.68
    itol
    -0.68
    alysed
    -0.67
    POSITIVE LOGITS
     namely
    0.95
     Whenever
    0.91
     Anyone
    0.89
     Unless
    0.85
     Why
    0.84
     Until
    0.84
     When
    0.84
     Firstly
    0.84
     Where
    0.83
     Forget
    0.83
    Act Density 0.120%

    No Known Activations