INDEX
    Explanations

    phrases indicating distinguishing characteristics or unique features

    phrases that identify distinguishing characteristics or qualities

    New Auto-Interp
    Negative Logits
    endix
    -0.80
    lance
    -0.75
    lex
    -0.74
    ixon
    -0.72
    xon
    -0.70
    erenn
    -0.69
    odder
    -0.67
    rentice
    -0.67
    tch
    -0.66
    cow
    -0.66
    POSITIVE LOGITS
     them
    0.71
    orno
    0.69
     Cu
    0.69
    rament
    0.67
     distinguishes
    0.67
     humanity
    0.67
     Sanct
    0.66
    ably
    0.64
     these
    0.63
     Tyrann
    0.63
    Act Density 0.107%

    No Known Activations