INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ledger
    -0.07
     trick
    -0.07
     positivity
    -0.07
    blockquote
    -0.07
    /ay
    -0.07
     Plaza
    -0.07
     marketplace
    -0.07
     kombin
    -0.06
     pok
    -0.06
    Around
    -0.06
    POSITIVE LOGITS
     Self
    0.09
     self
    0.09
     SELF
    0.08
    	self
    0.08
    eff
    0.08
     сильно
    0.08
     eff
    0.08
    Self
    0.07
    ев
    0.07
    OFF
    0.07
    Act Density 0.035%

    No Known Activations