INDEX
    Explanations

    negation words

    The neuron activates on words that signal comparison or contrast (e.g. than, less, rather, only, not) or emphasize degree in describing trade-offs.

    New Auto-Interp
    Negative Logits
     tart
    -0.08
     Nd
    -0.07
     SEM
    -0.06
    _types
    -0.06
     OCR
    -0.06
     Tricks
    -0.06
     stab
    -0.06
    Compile
    -0.06
     kart
    -0.06
     Rut
    -0.06
    POSITIVE LOGITS
    0.07
    ">--}}↵
    0.07
     hotelu
    0.06
    "};
    ↵
    0.06
    quota
    0.06
    /system
    0.06
     안내
    0.06
     getLast
    0.06
     DISCLAIMER
    0.06
    Available
    0.06
    Act Density 0.060%

    No Known Activations