INDEX
    Explanations

    gradient descent

    New Auto-Interp
    Negative Logits
    Hair
    -0.07
    315
    -0.07
     TRACE
    -0.07
     FAT
    -0.06
     Dock
    -0.06
    -0.06
    qed
    -0.06
    _icon
    -0.06
    -0.06
    مت
    -0.06
    POSITIVE LOGITS
     càng
    0.07
     ${↵
    0.07
    variably
    0.07
     gating
    0.06
     accuracy
    0.06
    ูนย
    0.06
    _COLORS
    0.06
    _WAKE
    0.06
    .skills
    0.06
    .',↵
    0.06
    Act Density 0.018%

    No Known Activations