INDEX
    Explanations

    negations or the word "not."

    New Auto-Interp
    Negative Logits
    DockStyle
    -1.04
     يتيمه
    -0.98
     purpoſe
    -0.96
     Weyl
    -0.96
     uſe
    -0.89
     ſtate
    -0.89
     homoto
    -0.89
     Sopho
    -0.87
     Huguen
    -0.86
     recto
    -0.86
    POSITIVE LOGITS
     is
    1.34
     a
    1.08
     being
    1.02
     not
    1.00
     are
    0.99
     was
    0.98
     Is
    0.97
     quite
    0.97
     è
    0.93
     an
    0.92
    Act Density 0.102%

    No Known Activations