INDEX
    Explanations

    mentions of categories or groups within society

    terms related to various groups and categories of people

    New Auto-Interp
    Negative Logits
    Reloaded
    -0.75
    OLOG
    -0.67
    saf
    -0.67
    rawdownloadcloneembedreportprint
    -0.65
    ASED
    -0.63
    ilogy
    -0.62
    GoldMagikarp
    -0.60
    oÄŁ
    -0.59
     charm
    -0.59
     pestic
    -0.58
    POSITIVE LOGITS
    hips
    1.15
    paces
    1.13
     alike
    1.13
    hip
    1.11
    pace
    0.98
    hops
    0.82
    chool
    0.82
    ets
    0.80
    '
    0.77
    ervatives
    0.75
    Act Density 0.456%

    No Known Activations