INDEX
    Explanations

    phrases that indicate direction or destination

    New Auto-Interp
    Negative Logits
    up
    -0.20
    rup
    -0.17
    wright
    -0.17
    rift
    -0.15
    rol
    -0.15
    au
    -0.15
    wick
    -0.15
    nt
    -0.14
    nap
    -0.14
     exactly
    -0.14
    POSITIVE LOGITS
    gether
    0.22
    obus
    0.20
    asting
    0.20
    tes
    0.20
    chter
    0.20
    /from
    0.20
    OLS
    0.20
    ools
    0.20
    pline
    0.20
    wner
    0.19
    Act Density 0.142%

    No Known Activations