INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     expose
    -0.07
     skin
    -0.07
     clash
    -0.07
    skin
    -0.07
    holz
    -0.07
     verdadeira
    -0.07
    man
    -0.07
    #:
    -0.07
    amage
    -0.07
    arti
    -0.07
    POSITIVE LOGITS
     outward
    0.10
     inward
    0.10
     hacia
    0.09
    [right
    0.09
    0.09
     perpendicular
    0.09
     istiq
    0.09
     sucked
    0.08
    ేట్
    0.08
    রত
    0.08
    Act Density 0.012%

    No Known Activations