INDEX
    Explanations

    In this case, the neuron appears to be looking for words related to criticism or negativity

    New Auto-Interp
    Negative Logits
    agher
    -1.16
    reon
    -1.01
    ulhu
    -1.00
    orate
    -0.98
     Ancients
    -0.95
    arians
    -0.95
    inosaur
    -0.95
    ICAN
    -0.94
    orians
    -0.94
     Mant
    -0.94
    POSITIVE LOGITS
    ball
    1.58
    ening
    1.47
    hearted
    1.41
    ener
    1.33
    grass
    1.32
    eners
    1.25
    cover
    1.25
     palate
    1.23
    heart
    1.21
    balls
    1.16
    Act Density 0.873%

    No Known Activations