INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sinister
    -0.07
    (identity
    -0.07
    �回
    -0.07
    -0.06
    POSITORY
    -0.06
     inte
    -0.06
     communicated
    -0.06
    _cod
    -0.06
     Romeo
    -0.06
     Himself
    -0.06
    POSITIVE LOGITS
    ,column
    0.06
     removeFrom
    0.06
    eq
    0.06
    .VERTICAL
    0.06
    ;br
    0.06
     неб
    0.06
    0.06
    (features
    0.06
    -as
    0.06
    (deg
    0.06
    Act Density 0.002%

    No Known Activations