INDEX
    Explanations

    phrases related to bias in various contexts

    New Auto-Interp
    Negative Logits
     Baltimore
    -0.21
     Kansas
    -0.18
    Kansas
    -0.17
     Worcester
    -0.17
     Nancy
    -0.16
     Maryland
    -0.15
     Arkansas
    -0.15
     Lowell
    -0.15
    illac
    -0.15
     Massachusetts
    -0.15
    POSITIVE LOGITS
     Titan
    0.39
     Titans
    0.39
     titan
    0.37
     tit
    0.31
    Titan
    0.29
     Attack
    0.28
    Tit
    0.28
     Tit
    0.27
     Levi
    0.27
     Survey
    0.24
    Act Density 0.002%

    No Known Activations