INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     insult
    -0.07
     burgl
    -0.07
    (sort
    -0.07
     QUESTION
    -0.06
     estimating
    -0.06
     autumn
    -0.06
     bou
    -0.06
     analogous
    -0.06
    	id
    -0.06
     Farr
    -0.06
    POSITIVE LOGITS
    .UInt
    0.07
     кри
    0.07
    ového
    0.07
    0.06
    	NdrFcShort
    0.06
    [^
    0.06
     виде
    0.06
     záp
    0.06
     무료
    0.06
    _BB
    0.06
    Act Density 0.005%

    No Known Activations