INDEX
    Explanations

    words or phrases conveying fairness and equality

    New Auto-Interp
    Negative Logits
    hip
    -0.15
    enk
    -0.15
     sucker
    -0.14
    .vo
    -0.14
     γά
    -0.14
     Interval
    -0.14
     seams
    -0.14
    coll
    -0.14
    cas
    -0.13
    ven
    -0.13
    POSITIVE LOGITS
    raki
    0.15
    jab
    0.14
    715
    0.14
    rupa
    0.14
    rup
    0.14
    ī´
    0.14
    ransition
    0.14
    gaard
    0.14
    ulton
    0.14
    Clr
    0.14
    Act Density 0.004%

    No Known Activations