INDEX
Explanations
terms related to affirmative action and socioeconomic disparities
New Auto-Interp
Negative Logits
myſelf
-0.82
houſe
-0.80
Theſe
-0.78
Monfieur
-0.77
Houſe
-0.73
Diſ
-0.73
pleaſure
-0.72
faſt
-0.72
ſtate
-0.72
Reſ
-0.71
POSITIVE LOGITS
to
0.75
or
0.74
and
0.73
versus
0.60
nonetheless
0.51
rather
0.51
nevertheless
0.50
vs
0.49
tagHelperRunner
0.48
through
0.47
Activations Density 0.400%