INDEX
    Explanations

    the presence of the word "Han" and variations of the word "not."

    New Auto-Interp
    Negative Logits
     okuyayım
    -0.68
     مشين
    -0.68
     alternates
    -0.67
     nakalista
    -0.65
    bagno
    -0.65
     Theſe
    -0.65
     Inscrivez
    -0.64
    -0.64
     Oester
    -0.63
     Jefus
    -0.62
    POSITIVE LOGITS
     не
    0.98
    Не
    0.72
     ne
    0.71
     Не
    0.69
    enterOuterAlt
    0.69
     Abp
    0.67
    ibouti
    0.65
     imp
    0.64
    epiece
    0.64
    riwal
    0.64
    Act Density 0.049%

    No Known Activations