INDEX
    Explanations

    Varying topics/articles

    New Auto-Interp
    Negative Logits
     '\\;'
    -0.92
     colorés
    -0.87
     définiti
    -0.86
     collè
    -0.85
     scolaires
    -0.85
    Билгалдахарш
    -0.84
     فريبيس
    -0.84
     fermés
    -0.82
     démocr
    -0.82
    colgroup
    -0.81
    POSITIVE LOGITS
     thanks
    0.44
     thank
    0.40
     largely
    0.37
     Drapeau
    0.37
     de
    0.36
     DES
    0.35
     likely
    0.34
     ber
    0.33
     in
    0.33
     Dior
    0.33
    Act Density 0.001%

    No Known Activations