INDEX
    Explanations

    phrases that indicate movement or transformation towards a goal or state

    New Auto-Interp
    Negative Logits
    ernet
    -0.16
    ose
    -0.15
    579
    -0.15
    ffa
    -0.15
    ipt
    -0.14
    ffer
    -0.14
    lep
    -0.14
    trand
    -0.14
    ersh
    -0.14
    elts
    -0.14
    POSITIVE LOGITS
     levels
    0.20
    stell
    0.16
     Level
    0.15
    orelease
    0.15
     zero
    0.15
     level
    0.15
     Poll
    0.15
     completion
    0.14
     Levels
    0.14
    owski
    0.14
    Act Density 0.092%

    No Known Activations