INDEX
    Explanations

    movement directions (up, down, left, right)

    New Auto-Interp
    Negative Logits
    1.00
    0.93
    0.89
    𒆪
    0.88
     проведение
    0.87
     കൊല്ല
    0.87
    ת
    0.86
     haga
    0.85
    Созда
    0.84
    定义
    0.82
    POSITIVE LOGITS
    o
    0.72
     J
    0.69
     d
    0.68
    oise
    0.67
     recirc
    0.66
    oes
    0.66
     D
    0.64
     N
    0.64
     lut
    0.63
    erver
    0.63
    Act Density 0.001%

    No Known Activations