INDEX
    Explanations

    references to actions or statuses regarding people and their societal roles

    New Auto-Interp
    Negative Logits
    ackbar
    -0.18
    atform
    -0.16
    icans
    -0.14
    deaux
    -0.14
    ulaire
    -0.14
    olumbia
    -0.14
    olis
    -0.14
    dio
    -0.14
    erç
    -0.14
    ccione
    -0.14
    POSITIVE LOGITS
     indeed
    0.15
    ãi
    0.15
     Sharing
    0.15
    ää
    0.15
    uela
    0.15
     sharing
    0.15
    sharing
    0.14
    ulle
    0.14
    isch
    0.14
    only
    0.14
    Act Density 0.011%

    No Known Activations