INDEX
    Explanations

    words related to the direction of events or situations

    references to the status or condition of things

    New Auto-Interp
    Negative Logits
    iciency
    -0.72
    essor
    -0.65
    tein
    -0.61
     disav
    -0.59
    asking
    -0.59
    ¿½
    -0.59
    ritical
    -0.59
     sole
    -0.58
     overwrite
    -0.57
    ividual
    -0.57
    POSITIVE LOGITS
     downhill
    0.77
     for
    0.75
     between
    0.73
     backstage
    0.71
     unfold
    0.68
     roy
    0.65
     huh
    0.64
     diplom
    0.63
    Things
    0.63
    FOR
    0.62
    Act Density 0.305%

    No Known Activations