INDEX
    Explanations

    negations and phrases indicating absence or contradiction

    New Auto-Interp
    Negative Logits
    GOTREF
    -0.74
     Theſe
    -0.73
     δὲ
    -0.70
     χώ
    -0.69
    Портали
    -0.69
    plast
    -0.64
     τῶν
    -0.63
    mogat
    -0.63
     lahat
    -0.63
    حياته
    -0.62
    POSITIVE LOGITS
     не
    1.01
    0.94
     Не
    0.91
     不
    0.89
    Не
    0.89
    有不
    0.84
    的不
    0.83
     לא
    0.80
     НЕ
    0.79
    他不
    0.76
    Act Density 0.041%

    No Known Activations