INDEX
Explanations
names of universities or educational institutions
New Auto-Interp
Negative Logits
WARD
-0.81
itives
-0.79
ences
-0.78
ports
-0.77
rahim
-0.73
ged
-0.69
Gors
-0.69
enced
-0.68
ded
-0.68
nings
-0.68
POSITIVE LOGITS
bones
0.93
agne
0.85
anooga
0.82
ron
0.79
avez
0.77
osen
0.75
bara
0.74
aign
0.72
unin
0.71
icago
0.70
Activations Density 0.059%