INDEX
    Explanations

    negations or expressions of contradiction

    New Auto-Interp
    Negative Logits
    DockStyle
    -0.93
     purpoſe
    -0.91
    AddTagHelper
    -0.90
     Jefus
    -0.88
     myſelf
    -0.88
     يتيمه
    -0.87
     ModelExpression
    -0.84
     disambiguazione
    -0.84
     pleaſure
    -0.83
     cauſe
    -0.82
    POSITIVE LOGITS
     not
    0.98
     going
    0.96
     is
    0.95
     a
    0.95
     also
    0.86
     an
    0.80
     really
    0.77
     being
    0.76
     likely
    0.75
     WAS
    0.75
    Act Density 0.102%

    No Known Activations