INDEX
    Explanations

    bias and confirmation bias

    New Auto-Interp
    Negative Logits
     art
    1.19
    art
    1.15
    艺术
    1.07
     искусства
    1.03
    藝術
    1.02
     Art
    1.01
    Art
    1.00
     arts
    1.00
     Arts
    0.93
     искусство
    0.89
    POSITIVE LOGITS
     bias
    3.03
     biases
    2.89
     Bias
    2.86
    Bias
    2.81
     biased
    2.71
    bias
    2.43
    biased
    2.26
     পক্ষপাত
    2.12
     biasing
    2.09
     prejudices
    1.80
    Act Density 0.236%

    No Known Activations