INDEX
    Explanations

    negation terms or phrases

    New Auto-Interp
    Negative Logits
     myſelf
    -1.56
     Efq
    -1.51
     Jefus
    -1.49
     itſelf
    -1.37
     ſeveral
    -1.35
     ſche
    -1.33
     Reſ
    -1.32
     pleaſure
    -1.31
     raiſ
    -1.31
     purpoſe
    -1.28
    POSITIVE LOGITS
     not
    1.86
     Not
    1.40
    not
    1.39
    Not
    1.19
     NOT
    1.18
     cannot
    1.10
     nicht
    1.01
    NOT
    1.01
    t
    0.96
     tidak
    0.94
    Act Density 0.220%

    No Known Activations