INDEX
    Explanations

    words related to verticality or directions like up and down

    references to vertical alignment or orientation

    New Auto-Interp
    Negative Logits
    keeper
    -0.91
    giving
    -0.88
    keeping
    -0.86
    REDACTED
    -0.84
    keepers
    -0.83
    unes
    -0.77
    ptives
    -0.75
    ãģ¦
    -0.75
    mberg
    -0.75
     [&
    -0.74
    POSITIVE LOGITS
     axis
    0.97
     dimension
    0.94
     stabil
    0.90
     ascent
    0.90
     dimensions
    0.86
     takeoff
    0.85
     velocity
    0.85
     vertical
    0.85
     integration
    0.85
     stripes
    0.83
    Act Density 0.020%

    No Known Activations