INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     intimately
    -0.07
     zijn
    -0.06
    يد
    -0.06
     Le
    -0.06
     maps
    -0.06
     gob
    -0.06
    _tw
    -0.06
     RG
    -0.06
    /lo
    -0.06
     đạt
    -0.06
    POSITIVE LOGITS
     граж
    0.07
     paren
    0.07
    ουλίου
    0.06
    URED
    0.06
     forc
    0.06
     horm
    0.06
     Majesty
    0.06
     CMP
    0.06
    λω
    0.06
    0.06
    Act Density 0.020%

    No Known Activations