INDEX
    Explanations

    terms related to intentions or goals

    phrases expressing intentions or future actions

    New Auto-Interp
    Negative Logits
    roy
    -0.81
    worth
    -0.73
    shown
    -0.71
    workers
    -0.68
    eros
    -0.64
    mask
    -0.62
    owners
    -0.61
    bur
    -0.60
    trust
    -0.59
    voc
    -0.59
    POSITIVE LOGITS
     emulate
    1.03
     improve
    0.92
     avoid
    0.90
     broaden
    0.90
     resume
    0.86
     tighten
    0.86
     incorporate
    0.85
     maximize
    0.84
     eliminate
    0.83
     conserve
    0.83
    Act Density 0.054%

    No Known Activations