INDEX
    Explanations

    The main thing this neuron does is find phrases related to negative consequences or issues

    words related to problems or challenges

    New Auto-Interp
    Negative Logits
     classy
    -0.77
     excel
    -0.76
     gifted
    -0.74
     cultured
    -0.73
     sublime
    -0.71
     proudly
    -0.71
     supreme
    -0.70
     orally
    -0.70
     eleg
    -0.70
     fictional
    -0.70
    POSITIVE LOGITS
    ruption
    1.25
    urrence
    1.16
    activation
    1.15
    issions
    1.12
    aution
    1.12
    downs
    1.12
    rification
    1.09
    illation
    1.07
    amping
    1.05
    gradation
    1.04
    Act Density 0.395%

    No Known Activations