INDEX
    Explanations

    gate control and structure

    New Auto-Interp
    Negative Logits
    1.17
    1.13
    1.05
    ä
    1.05
    ва
    1.03
    1.01
    ája
    1.00
    ة
    0.98
    ار
    0.94
    0.94
    POSITIVE LOGITS
     
    1.07
    nth
    1.05
    list
    1.00
    n
    0.98
    gate
    0.98
    ver
    0.97
    F
    0.96
    y
    0.94
    nine
    0.93
    map
    0.92
    Act Density 0.003%

    No Known Activations