INDEX
    Explanations

    phrases that indicate upward movement or progress

    New Auto-Interp
    Negative Logits
     downhill
    -0.32
     downwards
    -0.25
     downward
    -0.25
     decreasing
    -0.24
     decreased
    -0.23
     Down
    -0.22
    down
    -0.22
     Decre
    -0.22
    Decre
    -0.22
     down
    -0.21
    POSITIVE LOGITS
     rise
    0.49
     rises
    0.44
     rising
    0.43
     climb
    0.42
     ascent
    0.42
     risen
    0.42
     climbing
    0.41
     Rise
    0.40
     raise
    0.40
    -rise
    0.40
    Act Density 0.221%

    No Known Activations