INDEX
    Explanations

    phrases indicating distance or extent

    New Auto-Interp
    Negative Logits
    874
    -0.18
    nete
    -0.17
    ffer
    -0.16
    chw
    -0.16
    dings
    -0.15
    obar
    -0.15
     broader
    -0.14
    work
    -0.14
    IDD
    -0.14
    eil
    -0.14
    POSITIVE LOGITS
    -reaching
    0.28
    thest
    0.23
    /fast
    0.22
     away
    0.22
    away
    0.21
    mland
    0.20
     into
    0.20
    into
    0.19
     Away
    0.19
     apart
    0.18
    Act Density 0.033%

    No Known Activations