INDEX
    Explanations

    phrases indicating direction or movement

    phrases indicating direction or movement toward a specific target or outcome

    New Auto-Interp
    Negative Logits
    cu
    -0.80
    glers
    -0.72
    drivers
    -0.66
    ̶
    -0.66
    eyes
    -0.65
    amp
    -0.64
    chin
    -0.64
    hai
    -0.64
    chu
    -0.63
    span
    -0.63
    POSITIVE LOGITS
     adulthood
    0.98
     extinction
    0.96
     achieving
    0.96
    wards
    0.95
    WARD
    0.88
     completion
    0.88
     infinity
    0.87
     completing
    0.86
     solving
    0.83
     becoming
    0.82
    Act Density 0.058%

    No Known Activations