INDEX
    Explanations

    actions that are performed exceptionally or with high success

    New Auto-Interp
    Negative Logits
    cki
    -0.18
    út
    -0.15
    ">//
    -0.14
    halt
    -0.14
    stairs
    -0.14
    adecimal
    -0.14
    ufe
    -0.14
    htable
    -0.14
    ovsky
    -0.14
     Fa
    -0.14
    POSITIVE LOGITS
    fox
    0.27
    pace
    0.26
    smart
    0.26
    mus
    0.25
    gun
    0.24
    bid
    0.24
    score
    0.23
    strip
    0.22
    distance
    0.22
    match
    0.22
    Act Density 0.010%

    No Known Activations