INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     holy
    -0.07
    *cos
    -0.06
     normalized
    -0.06
     cops
    -0.06
    _apply
    -0.06
     secretion
    -0.06
     gauss
    -0.06
    _attack
    -0.06
    .shiro
    -0.06
     bear
    -0.06
    POSITIVE LOGITS
     disappointed
    0.13
     disappointment
    0.12
     disappoint
    0.09
     disappointing
    0.09
     dashed
    0.09
    icap
    0.08
     Mal
    0.07
     unforgettable
    0.07
    işim
    0.07
    dhcp
    0.07
    Act Density 0.008%

    No Known Activations