INDEX
Explanations
references to various nationalities or groups of people
New Auto-Interp
Negative Logits
gger
-0.18
lessly
-0.18
illard
-0.17
ãĥijãĥ³
-0.15
acular
-0.15
olson
-0.15
ayer
-0.14
atform
-0.14
ácil
-0.14
人åijĺ
-0.14
POSITIVE LOGITS
-American
0.17
who
0.16
anness
0.15
-only
0.15
ischer
0.14
-made
0.14
tons
0.14
addslashes
0.14
.gwt
0.13
fing
0.13
Activations Density 0.097%