INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     бизне
    0.45
    ܬܐ
    0.40
    ษัท
    0.39
    aii
    0.39
    Fraud
    0.39
     инфек
    0.38
    tamil
    0.38
     yaar
    0.37
    ijl
    0.36
    ):(
    0.36
    POSITIVE LOGITS
     Part
    0.66
     drag
    0.62
     Drag
    0.59
     Submit
    0.55
     dragging
    0.55
    Drag
    0.52
    drag
    0.51
     drags
    0.50
     My
    0.49
    Part
    0.48
    Act Density 0.005%

    No Known Activations