INDEX
    Explanations

    words related to judgment and critique

    lists of items or instances separated by commas

    New Auto-Interp
    Negative Logits
    OND
    -0.79
    oir
    -0.72
    Tech
    -0.67
    HAM
    -0.66
    hers
    -0.63
    isers
    -0.62
    CLOSE
    -0.62
    оÐ
    -0.62
    OUGH
    -0.61
    Russ
    -0.61
    POSITIVE LOGITS
     nevertheless
    0.70
     disg
    0.66
     dominates
    0.65
     etc
    0.65
     annihil
    0.64
     circa
    0.62
     massac
    0.62
     litter
    0.61
     behaved
    0.61
    artifacts
    0.61
    Act Density 0.375%

    No Known Activations