INDEX
    Explanations

    phrases indicating future progress or direction

    phrases that indicate future actions or directions

    New Auto-Interp
    Negative Logits
    ulin
    -0.73
    uminati
    -0.70
    eness
    -0.68
    ules
    -0.67
    oola
    -0.66
    ises
    -0.65
    trak
    -0.62
    uum
    -0.61
    odor
    -0.59
    oup
    -0.58
    POSITIVE LOGITS
     forward
    1.32
    forward
    1.27
     into
    1.20
     forwards
    1.13
     Into
    1.01
     Forward
    0.98
    into
    0.97
     INTO
    0.96
     onward
    0.95
     onwards
    0.91
    Act Density 0.079%

    No Known Activations