INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    toDouble
    -0.06
    token
    -0.06
     theaters
    -0.06
    tgt
    -0.06
    inet
    -0.06
    Torrent
    -0.06
     skillet
    -0.06
    ulant
    -0.06
     снова
    -0.06
     дерева
    -0.05
    POSITIVE LOGITS
    월까지
    0.07
     Verification
    0.07
     invit
    0.07
     tp
    0.07
     Cyprus
    0.07
    pane
    0.06
     crowned
    0.06
     conferred
    0.06
    χω
    0.06
    0.06
    Act Density 0.002%

    No Known Activations