INDEX
    Explanations

    references to groups of people or collective actions

    references to collective experiences or generalizations about groups of people

    New Auto-Interp
    Negative Logits
    aic
    -0.87
    ean
    -0.81
    eda
    -0.75
    antic
    -0.74
    ularity
    -0.73
    ea
    -0.72
    rd
    -0.69
    eus
    -0.67
    hent
    -0.67
    effect
    -0.65
    POSITIVE LOGITS
     else
    1.13
    selves
    0.80
    bags
    0.79
    THING
    0.77
    WAYS
    0.76
     wanna
    0.75
     nodd
    0.73
    ````
    0.73
     gotta
    0.72
    bage
    0.72
    Act Density 0.040%

    No Known Activations