INDEX
Explanations
the word "raison" or words containing it
references to the concept of "race" or "racial issues."
New Auto-Interp
Negative Logits
renheit
-0.90
izabeth
-0.84
rate
-0.79
rates
-0.78
sburgh
-0.77
ledged
-0.76
grad
-0.74
Darius
-0.71
sburg
-0.70
ij士
-0.70
POSITIVE LOGITS
posium
0.80
sidx
0.80
SpaceEngineers
0.74
istically
0.72
agate
0.70
isin
0.70
selves
0.69
srfAttach
0.68
ific
0.67
sem
0.67
Activations Density 0.057%