INDEX
Explanations
references to political minority groups and their struggles
New Auto-Interp
Negative Logits
oland
-0.16
isque
-0.15
erah
-0.14
inent
-0.14
ante
-0.14
hub
-0.14
373
-0.13
eya
-0.13
agu
-0.13
636
-0.13
POSITIVE LOGITS
groups
0.47
minority
0.43
minorities
0.40
groups
0.40
group
0.38
-groups
0.37
Groups
0.36
minor
0.34
.groups
0.34
segments
0.33
Activations Density 0.204%