INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     protoimpl
    -0.69
    AnchorStyles
    -0.66
     fidé
    -0.63
     wikipagina
    -0.63
     &___
    -0.62
     feroit
    -0.62
    MigrationBuilder
    -0.58
     définiti
    -0.57
     quelcon
    -0.57
    oarece
    -0.56
    POSITIVE LOGITS
     to
    0.57
     debate
    0.48
     rest
    0.47
    <![
    0.47
    adox
    0.45
     hab
    0.45
    chale
    0.45
     debating
    0.45
     gew
    0.45
     argu
    0.44
    Act Density 0.005%

    No Known Activations