INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    internet
    -0.06
     Vern
    -0.06
    ths
    -0.06
     freq
    -0.06
     Arth
    -0.06
     remnants
    -0.06
    _rt
    -0.06
    IColor
    -0.06
     redundancy
    -0.06
     rigged
    -0.06
    POSITIVE LOGITS
     close
    0.14
     Close
    0.11
    Close
    0.11
     closer
    0.10
    close
    0.09
    /close
    0.09
     CLOSE
    0.08
     sıcak
    0.08
    -close
    0.08
    (close
    0.07
    Act Density 0.024%

    No Known Activations