INDEX
Explanations
references to race and gender
New Auto-Interp
Negative Logits
AsUp
-0.64
kasarigan
-0.57
Pratique
-0.56
حياتها
-0.54
حياته
-0.54
miembro
-0.53
setContentView
-0.52
ніципалі
-0.50
]='\
-0.50
zzleHttp
-0.50
POSITIVE LOGITS
peers
1.14
counterparts
1.08
contemporaries
1.01
colleagues
0.98
compatriots
0.90
neighbors
0.89
predecessors
0.89
fellow
0.89
peers
0.85
brethren
0.79
Activations Density 0.350%