INDEX
Explanations
key concepts related to importance and significance in various contexts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.09
3:0.17
4:0.16
5:0.03
6:0.09
7:0.08
8:0.04
9:0.06
10:0.08
11:0.08
Negative Logits
��
-1.61
tradem
-1.60
iversal
-1.54
��極
-1.46
�
-1.45
ooked
-1.45
��
-1.43
redibly
-1.43
rius
-1.40
���
-1.40
POSITIVE LOGITS
inherent
1.91
pitfalls
1.77
misconceptions
1.74
aspects
1.74
disparity
1.72
differences
1.71
injust
1.69
disparities
1.68
aspect
1.68
inequalities
1.65
Activations Density 0.303%