INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     tolik
    -0.07
     خارجی
    -0.07
     Godzilla
    -0.06
    _End
    -0.06
    022
    -0.06
     quieres
    -0.06
    ificial
    -0.06
    ених
    -0.06
    -town
    -0.06
     menor
    -0.06
    POSITIVE LOGITS
     Amendment
    0.07
     tidy
    0.07
     uLocal
    0.07
    فة
    0.06
    0.06
     tighter
    0.06
    PLUGIN
    0.06
    .Percent
    0.06
    0
    0.06
     ettir
    0.06
    Act Density 0.012%

    No Known Activations