INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     appear
    -0.07
     garments
    -0.06
    IRROR
    -0.06
     homogeneous
    -0.06
     dorm
    -0.06
    -0.06
     momentarily
    -0.06
    Library
    -0.06
     retract
    -0.06
    --------------↵
    -0.06
    POSITIVE LOGITS
     downside
    0.11
     Ups
    0.09
     upside
    0.09
     وش
    0.08
    Ups
    0.07
     라이
    0.07
    upos
    0.07
     уси
    0.07
    hoe
    0.07
    iven
    0.07
    Act Density 0.004%

    No Known Activations