INDEX
    Explanations

    substitutions/replacements

    New Auto-Interp
    Negative Logits
     russian
    -0.07
    „ط
    -0.07
     ww
    -0.07
     ellas
    -0.07
     özgür
    -0.06
     деятельности
    -0.06
    Orig
    -0.06
     realizar
    -0.06
    ılış
    -0.06
     Them
    -0.06
    POSITIVE LOGITS
    -scal
    0.06
     delaying
    0.06
    дается
    0.06
    	create
    0.06
     число
    0.06
    _frm
    0.06
    azz
    0.05
     nehmen
    0.05
     mul
    0.05
    icorn
    0.05
    Act Density 0.096%

    No Known Activations