INDEX
    Explanations

    It seems like the neuron is mainly looking for words related to negation

    the word "not" and its variations in context

    New Auto-Interp
    Negative Logits
    stakes
    -0.71
     Circuit
    -0.71
    itor
    -0.70
     Spotlight
    -0.69
     Tycoon
    -0.67
     Pros
    -0.67
     Contrast
    -0.65
     Comparison
    -0.65
     Expansion
    -0.65
     Handbook
    -0.64
    POSITIVE LOGITS
    icably
    1.39
    epad
    1.20
    icable
    1.15
     necessarily
    1.10
    hin
    1.04
    orious
    0.95
    ched
    0.92
    withstanding
    0.89
     yet
    0.85
    ifications
    0.83
    Act Density 0.171%

    No Known Activations