INDEX
    Explanations

    directional words and phrases indicating movement or position

    New Auto-Interp
    Negative Logits
    rams
    -0.67
    ivities
    -0.67
    ories
    -0.63
    usc
    -0.62
    atively
    -0.61
    xious
    -0.60
    sels
    -0.60
    uren
    -0.58
    iny
    -0.58
    matically
    -0.57
    POSITIVE LOGITS
    cliffe
    0.70
    ruary
    0.67
    stage
    0.62
     Vulcan
    0.62
    ategory
    0.60
    stairs
    0.60
    othal
    0.60
    hovah
    0.60
    flix
    0.59
    WARD
    0.59
    Act Density 0.011%

    No Known Activations