INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     punitive
    -0.07
    _io
    -0.06
    variation
    -0.06
    _ld
    -0.06
    ertype
    -0.06
     aromatic
    -0.06
     leftist
    -0.06
    jectives
    -0.06
    dao
    -0.06
    (form
    -0.06
    POSITIVE LOGITS
    0.08
    0.07
     fakt
    0.07
     Tick
    0.07
    0.07
    	Print
    0.06
     anyway
    0.06
    (feed
    0.06
     инструк
    0.06
     बत
    0.06
    Act Density 0.138%

    No Known Activations