INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    дом
    -0.08
     --------
    -0.07
     Ryan
    -0.07
     hayır
    -0.07
    opez
    -0.06
     برق
    -0.06
    fff
    -0.06
     Iranian
    -0.06
     Hungary
    -0.06
     Iraq
    -0.06
    POSITIVE LOGITS
    0.07
     tempt
    0.07
    .Dock
    0.07
     Catalyst
    0.06
    alive
    0.06
    (ar
    0.06
     Plate
    0.06
     вас
    0.06
     battlefield
    0.06
     oath
    0.06
    Act Density 0.291%

    No Known Activations