INDEX
    Explanations

    discussions about the treatment of individuals, particularly in relation to equality and respect across different contexts

    New Auto-Interp
    Negative Logits
     defaultstate
    -0.41
    優れた
    -0.38
     [*]
    -0.36
     effective
    -0.36
     readily
    -0.35
     tarvit
    -0.35
     чудо
    -0.35
     expérimentés
    -0.35
    -0.35
     valid
    -0.34
    POSITIVE LOGITS
     differently
    1.39
     correctly
    0.82
     accordingly
    0.82
     incorrectly
    0.80
    correctly
    0.74
     diffé
    0.73
    differ
    0.73
     similarly
    0.71
     Differ
    0.71
     autrement
    0.66
    Act Density 0.582%

    No Known Activations