INDEX
    Explanations

    instances of movement or wandering

    New Auto-Interp
    Negative Logits
    oa
    -0.15
     Bench
    -0.15
    roti
    -0.15
     Hoover
    -0.15
    ighbors
    -0.15
    coon
    -0.14
    ILA
    -0.14
    ARP
    -0.14
    acher
    -0.14
     Grow
    -0.14
    POSITIVE LOGITS
     freely
    0.16
    391
    0.15
    /up
    0.15
    mf
    0.14
    467
    0.14
     alleg
    0.14
    361
    0.14
    866
    0.14
    talk
    0.14
    rup
    0.14
    Act Density 0.067%

    No Known Activations