INDEX
    Explanations

    words related to specific groups or categories, such as professions, demographics, or ideologies

    references to specific groups or categories of people and their roles in various contexts

    New Auto-Interp
    Negative Logits
     scrimmage
    -0.66
    ulative
    -0.59
     Held
    -0.59
    iasis
    -0.58
     0004
    -0.56
     Carbuncle
    -0.54
    rift
    -0.54
    oward
    -0.54
    umn
    -0.54
    ieves
    -0.53
    POSITIVE LOGITS
     itself
    1.16
     ones
    1.11
     themselves
    0.98
    )</
    0.89
    !).
    0.87
     thereof
    0.83
     himself
    0.83
     herself
    0.81
    ).[
    0.77
    ).
    0.76
    Act Density 0.570%

    No Known Activations