INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    /features
    -0.08
     English
    -0.07
    InParameter
    -0.07
    善意
    -0.07
     wonder
    -0.07
     camouflage
    -0.06
    _RANDOM
    -0.06
    .F
    -0.06
    opus
    -0.06
    .length
    -0.06
    POSITIVE LOGITS
     księg
    0.07
     applies
    0.07
    基辅
    0.07
    𬤊
    0.07
    奠定了
    0.07
     Uploaded
    0.07
    _XDECREF
    0.07
    естеств
    0.07
    0.07
     prevailed
    0.07
    Act Density 0.004%

    No Known Activations