INDEX
    Explanations

    touching and not touching

    New Auto-Interp
    Negative Logits
     touch
    0.64
     способом
    0.63
    an
    0.62
     anonymous
    0.61
     be
    0.60
     и
    0.60
    by
    0.58
     anonymously
    0.58
     ي
    0.58
     designate
    0.57
    POSITIVE LOGITS
    0.63
     Mountains
    0.52
    主流
    0.52
     copp
    0.52
     README
    0.52
    ਿੱ
    0.52
    يد
    0.51
     Liquor
    0.50
    大量
    0.50
     MEX
    0.50
    Act Density 0.001%

    No Known Activations