INDEX
Explanations
connections or links between concepts or entities
New Auto-Interp
Negative Logits
quished
-0.94
\\\\\\\\
-0.88
OGR
-0.83
TPPStreamerBot
-0.79
////////////////
-0.78
stal
-0.76
nell
-0.75
abases
-0.74
dp
-0.73
entimes
-0.73
POSITIVE LOGITS
sexes
0.81
criminality
0.75
genders
0.75
these
0.75
ethnicity
0.74
disparate
0.70
academics
0.67
humans
0.67
counties
0.67
geography
0.66
Activations Density 0.029%