INDEX
Explanations
instances of the word "race" and its variations
New Auto-Interp
Negative Logits
olia
-0.76
iar
-0.76
oca
-0.75
arial
-0.73
tymology
-0.72
lishes
-0.71
berra
-0.71
vironment
-0.67
osis
-0.66
iated
-0.65
POSITIVE LOGITS
horse
1.31
course
1.29
cars
1.05
bike
1.04
car
0.98
nell
0.84
runners
0.83
runner
0.83
bikes
0.82
goers
0.79
Activations Density 0.018%