INDEX
    Explanations

    phrases indicating a comparison or contrast between different perspectives or approaches

    negations or phrases emphasizing what something is not

    New Auto-Interp
    Negative Logits
    *:
    -0.62
    riber
    -0.60
    }}}
    -0.60
    lance
    -0.60
    stru
    -0.60
    kees
    -0.58
    WER
    -0.58
    FAQ
    -0.58
    stice
    -0.57
     [+
    -0.55
    POSITIVE LOGITS
     necessarily
    1.54
    epad
    1.16
    withstanding
    1.09
    ifying
    0.98
     merely
    0.95
    icably
    0.92
    ifies
    0.91
     vice
    0.88
     unlike
    0.86
     just
    0.85
    Act Density 0.059%

    No Known Activations