INDEX
    Explanations

    phrases related to goals and aims

    statements of goals or intentions

    New Auto-Interp
    Negative Logits
    odox
    -0.63
    vae
    -0.60
    regular
    -0.58
    affe
    -0.58
    aro
    -0.56
    quin
    -0.55
    iery
    -0.55
     occas
    -0.55
    icol
    -0.52
    eor
    -0.52
    POSITIVE LOGITS
     to
    1.13
    to
    0.96
     maximizing
    0.84
     ensuring
    0.80
     preservation
    0.79
     To
    0.77
     simple
    0.74
     simplicity
    0.73
     educating
    0.71
     TO
    0.71
    Act Density 0.127%

    No Known Activations