INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Beſ
    -0.88
     itſelf
    -0.87
     Theſe
    -0.85
     Vikipedi
    -0.82
    myModal
    -0.78
     proprement
    -0.77
     ainfi
    -0.77
     ſche
    -0.76
    DockStyle
    -0.76
     imprimée
    -0.75
    POSITIVE LOGITS
     non
    2.19
     Non
    2.08
    Non
    2.07
    non
    1.99
     NON
    1.93
    NON
    1.70
    1.68
     nons
    1.38
     非
    1.34
     Nons
    1.33
    Act Density 0.089%

    No Known Activations