INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ElementRef
    -0.08
    Applications
    -0.07
    _transaksi
    -0.07
     làm
    -0.07
     đăng
    -0.07
     summoned
    -0.06
    $l
    -0.06
    어가
    -0.06
     بیمه
    -0.06
             
    -0.06
    POSITIVE LOGITS
     hate
    0.10
     Hate
    0.09
    inous
    0.07
     status
    0.06
    0.06
     OWN
    0.06
    Say
    0.06
    xon
    0.06
    _ROT
    0.06
     "-
    0.06
    Act Density 0.010%

    No Known Activations