INDEX
    Explanations

    phrases with negative connotations or controversial topics

    New Auto-Interp
    Negative Logits
    Iglesia
    -0.50
     święta
    -0.43
     Sánchez
    -0.43
     Williams
    -0.42
     száll
    -0.42
    classnames
    -0.41
     RELIGION
    -0.41
     Петра
    -0.40
     Rivers
    -0.40
     vertelt
    -0.40
    POSITIVE LOGITS
     Fo
    1.18
    Fo
    1.14
     FO
    1.05
    fo
    0.97
     fo
    0.97
    FO
    0.87
     Foil
    0.87
     foams
    0.84
     FOG
    0.83
     foaming
    0.82
    Act Density 0.138%

    No Known Activations