INDEX
    Explanations

    phrases related to steps or actions needed to achieve a specific goal

    phrases related to instructions or guidelines for achieving tasks

    New Auto-Interp
    Negative Logits
     outweigh
    -0.78
     outwe
    -0.68
     drowned
    -0.67
    marine
    -0.65
     benches
    -0.65
     alive
    -0.63
    burning
    -0.63
     quot
    -0.62
    pez
    -0.62
     harb
    -0.61
    POSITIVE LOGITS
    aucus
    0.73
    itialized
    0.68
     nutshell
    0.66
     Brief
    0.65
    ertain
    0.64
     recap
    0.64
    zbek
    0.64
     meantime
    0.63
     disclaimer
    0.62
    kinson
    0.62
    Act Density 0.121%

    No Known Activations