INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    of
    -1.09
    Of
    -1.00
     Of
    -0.84
     financières
    -0.82
     féminine
    -0.63
     mères
    -0.61
     nationales
    -0.57
     chré
    -0.56
     refusé
    -0.55
     pères
    -0.54
    POSITIVE LOGITS
    ")));
    
    0.84
    ]));
    
    0.82
     }}$}
    0.79
    "])
    
    0.79
    "]);
    
    0.77
     the
    0.75
    "]));
    0.73
    '));
    
    0.72
    )");
    
    0.71
    "));
    
    0.71
    Act Density 1.606%

    No Known Activations