INDEX
    Explanations

    comparisons of how different individuals treat others, focusing on terms like "comrades" and "servants."

    New Auto-Interp
    Negative Logits
     Minang
    -0.66
     disagre
    -0.66
     tolerably
    -0.62
     nobly
    -0.62
     vainly
    -0.61
     impra
    -0.61
     Putih
    -0.61
     profuse
    -0.61
     imperfectly
    -0.61
     unspeak
    -0.60
    POSITIVE LOGITS
     prostitu
    0.67
     divertimento
    0.66
     tutt
    0.64
     religione
    0.63
     teolog
    0.62
     meras
    0.59
     palio
    0.59
     rilass
    0.58
     fatte
    0.58
     parteci
    0.58
    Act Density 0.374%

    No Known Activations