INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Deluxe
    -0.07
     Rever
    -0.07
    -0.06
    .Long
    -0.06
     saldo
    -0.06
     Silence
    -0.06
     glBegin
    -0.06
    	light
    -0.06
     drink
    -0.06
    681
    -0.06
    POSITIVE LOGITS
    vs
    0.06
     sorte
    0.06
     좋아
    0.06
     inability
    0.06
    _COM
    0.06
     prevailed
    0.06
    бер
    0.06
     Conditional
    0.06
     factual
    0.06
     ineffective
    0.06
    Act Density 0.049%

    No Known Activations