INDEX
    Explanations

    words indicating strong positive or negative evaluations

    New Auto-Interp
    Negative Logits
     feasibility
    -0.72
     resolutions
    -0.69
    olutions
    -0.68
     Abstract
    -0.68
    urances
    -0.68
    ancies
    -0.65
    inav
    -0.64
    uld
    -0.64
     conducted
    -0.64
    ongyang
    -0.63
    POSITIVE LOGITS
     incarn
    0.92
     sleeper
    0.87
    apego
    0.79
     keeper
    0.79
     admire
    0.73
     collaborator
    0.73
     breed
    0.71
     performer
    0.71
    lier
    0.70
     messenger
    0.69
    Act Density 0.158%

    No Known Activations