INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ">\
    -0.07
    .Registry
    -0.07
    -0.07
       ↵↵
    -0.07
    Ubergraph
    -0.06
    cimal
    -0.06
    peer
    -0.06
     trabaj
    -0.06
    言った
    -0.06
     sweat
    -0.06
    POSITIVE LOGITS
    .construct
    0.07
    irts
    0.06
     instruct
    0.06
     UIG
    0.06
     INTERNAL
    0.06
     состояние
    0.06
    _method
    0.06
    -report
    0.06
    ープ
    0.06
     NE
    0.06
    Act Density 0.005%

    No Known Activations