INDEX
Explanations
language related to categorization and belonging, such as membership, relatedness, race, and elites
references to ethnic or cultural group membership
New Auto-Interp
Negative Logits
effic
-0.73
displayText
-0.68
charges
-0.68
unanswered
-0.66
unfolds
-0.64
Results
-0.62
efficiency
-0.62
Everything
-0.61
hitting
-0.61
fumble
-0.61
POSITIVE LOGITS
hapl
1.04
caste
1.00
ethnic
0.99
groups
0.98
category
0.98
sects
0.96
minority
0.96
subclass
0.95
sect
0.95
genus
0.94
Activations Density 0.289%