INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Belt
    -0.07
    _emb
    -0.07
     Fix
    -0.06
     FontWeight
    -0.06
    _variance
    -0.06
    нивер
    -0.06
     Vanilla
    -0.06
     filler
    -0.06
    /rec
    -0.06
     ב
    -0.06
    POSITIVE LOGITS
     İslam
    0.08
    0.07
    oteca
    0.07
    arden
    0.07
     Congressman
    0.06
     NAND
    0.06
     Interrupt
    0.06
     Airlines
    0.06
    0.06
    	private
    0.06
    Act Density 0.200%

    No Known Activations