INDEX
    Explanations

    instances where the concept of "fairness" is mentioned

    instances of the concept of fairness

    New Auto-Interp
    Negative Logits
    CHAT
    -0.83
    apse
    -0.81
    uality
    -0.72
     Saga
    -0.66
     Reincarnated
    -0.66
    Ultra
    -0.65
    clips
    -0.65
    \<
    -0.65
    OUS
    -0.64
    Extra
    -0.62
    POSITIVE LOGITS
    grounds
    1.25
    yt
    1.10
     fair
    0.90
    ground
    0.90
    fair
    0.89
    iciary
    0.83
    abouts
    0.81
    itably
    0.79
    child
    0.79
    heet
    0.77
    Act Density 0.015%

    No Known Activations