INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Ngày
    -0.07
    =X
    -0.07
    _RO
    -0.06
     тради
    -0.06
     lax
    -0.06
     hind
    -0.06
    _notes
    -0.06
    _successful
    -0.06
     DirectoryInfo
    -0.06
    getX
    -0.06
    POSITIVE LOGITS
     girlfriends
    0.09
     fiance
    0.08
    friends
    0.08
    彼女
    0.08
     ragazza
    0.08
     Girlfriend
    0.07
    0.07
    birds
    0.07
     boys
    0.07
    eurs
    0.07
    Act Density 0.015%

    No Known Activations