INDEX
Explanations
references to specific individuals or groups, particularly pronouns like "him" and "them."
New Auto-Interp
Negative Logits
Gastro
-0.47
Gastro
-0.47
Ekonomi
-0.46
Poverty
-0.46
paleo
-0.45
gastro
-0.45
kilo
-0.45
electro
-0.44
socio
-0.44
Austro
-0.44
POSITIVE LOGITS
them
0.80
him
0.80
Them
0.74
THEM
0.72
them
0.72
Him
0.72
Them
0.71
us
0.71
subpackage
0.63
Him
0.60
Activations Density 0.152%