INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Credit
    -0.06
     rég
    -0.06
    ismo
    -0.06
    ь
    -0.06
     pró
    -0.06
    rieg
    -0.06
     City
    -0.06
    _CITY
    -0.06
     girlfriends
    -0.06
    -0.06
    POSITIVE LOGITS
     Minimal
    0.12
     minimal
    0.12
    minimal
    0.10
     minimalist
    0.09
    Minimal
    0.08
    ôle
    0.07
     minimum
    0.07
     minimise
    0.07
    _buffer
    0.06
    для
    0.06
    Act Density 0.005%

    No Known Activations