INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     autorytatywna
    -1.05
    ]='\
    -0.91
     GeoNames
    -0.90
     himo
    -0.88
     <",
    -0.88
    новништво
    -0.86
     worauf
    -0.86
    ROGEN
    -0.86
    ynb
    -0.85
    Autoritní
    -0.85
    POSITIVE LOGITS
     milf
    1.79
     disreg
    1.74
     depic
    1.74
     inconce
    1.69
     encomp
    1.67
     maneu
    1.66
     intersper
    1.64
     increa
    1.62
     snoopy
    1.60
     suscep
    1.60
    Act Density 0.227%

    No Known Activations