INDEX
    Explanations

    phrases indicating progress or advancements towards goals

    New Auto-Interp
    Negative Logits
    urban
    -0.67
     Males
    -0.62
     Directions
    -0.61
     followed
    -0.59
    thia
    -0.59
     trailing
    -0.59
    akia
    -0.59
     Cf
    -0.58
     commemor
    -0.57
     escapes
    -0.57
    POSITIVE LOGITS
     notch
    1.00
     levels
    0.98
     level
    0.98
     heights
    0.96
     brink
    0.95
     extremes
    0.83
     absurdity
    0.82
     depths
    0.81
     point
    0.80
     perfection
    0.79
    Act Density 0.179%

    No Known Activations