INDEX
    Explanations

    references to specific individuals or groups, particularly pronouns like "him" and "them."

    New Auto-Interp
    Negative Logits
    Gastro
    -0.47
     Gastro
    -0.47
     Ekonomi
    -0.46
    Poverty
    -0.46
     paleo
    -0.45
     gastro
    -0.45
    kilo
    -0.45
    electro
    -0.44
    socio
    -0.44
     Austro
    -0.44
    POSITIVE LOGITS
     them
    0.80
     him
    0.80
     Them
    0.74
     THEM
    0.72
    them
    0.72
    Him
    0.72
    Them
    0.71
     us
    0.71
    subpackage
    0.63
     Him
    0.60
    Act Density 0.152%

    No Known Activations