INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     friendly
    -0.06
    ="//
    -0.06
    ัท
    -0.06
    bus
    -0.06
     Elephant
    -0.06
    uces
    -0.05
     Liebe
    -0.05
     States
    -0.05
     patter
    -0.05
     leg
    -0.05
    POSITIVE LOGITS
    щи
    0.08
    0.07
     ASF
    0.07
    	glBind
    0.07
    .bias
    0.07
     #=>
    0.07
     Because
    0.07
     tane
    0.07
    $s
    0.07
    0.06
    Act Density 0.061%

    No Known Activations