INDEX
Explanations
nationalities of different groups of people
references to specific nationalities or ethnic groups
New Auto-Interp
Negative Logits
Chain
-0.69
Chain
-0.63
Smith
-0.63
scape
-0.62
miss
-0.61
confirmation
-0.61
CHECK
-0.60
Virginia
-0.59
Leader
-0.59
ologically
-0.59
POSITIVE LOGITS
aurus
1.02
paces
1.00
ugi
0.89
who
0.88
ourced
0.81
ervatives
0.79
ktop
0.78
stationed
0.78
ervative
0.77
ouls
0.77
Activations Density 0.049%