INDEX
Explanations
terms related to biases in decision-making processes
New Auto-Interp
Negative Logits
ầm
-0.15
¼
-0.15
jang
-0.15
rey
-0.14
(())↵
-0.14
rál
-0.14
dignity
-0.13
dff
-0.13
716
-0.13
croll
-0.13
POSITIVE LOGITS
bias
0.59
Bias
0.51
biases
0.50
biased
0.49
bias
0.48
Bias
0.46
_bias
0.43
biased
0.40
åģı
0.38
.bias
0.33
Activations Density 0.234%