INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     découver
    -0.88
     flesta
    -0.86
     varandra
    -0.86
    OGND
    -0.84
     stället
    -0.82
     chrétiens
    -0.81
     démocr
    -0.79
     تضيفلها
    -0.78
     NSCoder
    -0.78
     متعلقه
    -0.77
    POSITIVE LOGITS
     anti
    0.65
     jug
    0.61
    ,
    0.61
     next
    0.60
     in
    0.57
     plot
    0.54
     worth
    0.53
    0.52
     inter
    0.52
     well
    0.52
    Act Density 0.049%

    No Known Activations