INDEX
    Explanations

    suggestions for improving code or functionality

    New Auto-Interp
    Negative Logits
    atsby
    -0.17
     reused
    -0.17
    лаÑĤ
    -0.16
    amburger
    -0.16
    warts
    -0.15
    reuse
    -0.15
    _STANDARD
    -0.14
    arkan
    -0.14
    »
    -0.14
    uste
    -0.14
    POSITIVE LOGITS
     instead
    0.21
     separate
    0.19
     resort
    0.17
    instead
    0.16
    Instead
    0.15
     Roh
    0.15
     separately
    0.15
     Instead
    0.15
    _wrapper
    0.15
     wrapper
    0.15
    Act Density 0.107%

    No Known Activations