INDEX
    Explanations

    key concepts related to importance and significance in various contexts

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.03
    2:0.09
    3:0.17
    4:0.16
    5:0.03
    6:0.09
    7:0.08
    8:0.04
    9:0.06
    10:0.08
    11:0.08
    Negative Logits
    ��
    -1.61
     tradem
    -1.60
    iversal
    -1.54
    ��極
    -1.46
    -1.45
    ooked
    -1.45
    ��
    -1.43
    redibly
    -1.43
    rius
    -1.40
    ���
    -1.40
    POSITIVE LOGITS
     inherent
    1.91
     pitfalls
    1.77
     misconceptions
    1.74
     aspects
    1.74
     disparity
    1.72
     differences
    1.71
     injust
    1.69
     disparities
    1.68
     aspect
    1.68
     inequalities
    1.65
    Act Density 0.303%

    No Known Activations