INDEX
    Explanations

    terms related to biases in decision-making processes

    New Auto-Interp
    Negative Logits
    ầm
    -0.15
    ¼
    -0.15
    jang
    -0.15
    rey
    -0.14
    (())↵
    -0.14
    rál
    -0.14
     dignity
    -0.13
    dff
    -0.13
    716
    -0.13
    croll
    -0.13
    POSITIVE LOGITS
     bias
    0.59
     Bias
    0.51
     biases
    0.50
     biased
    0.49
    bias
    0.48
    Bias
    0.46
    _bias
    0.43
    biased
    0.40
    åģı
    0.38
    .bias
    0.33
    Act Density 0.234%

    No Known Activations