INDEX
    Explanations

    The neuron responds to occurrences of the word “ban” (i.e. calls for prohibitions).

    New Auto-Interp
    Negative Logits
    Coeff
    -0.08
     thorough
    -0.08
    heart
    -0.08
    -0.07
     Elliot
    -0.07
     Heart
    -0.07
    Wood
    -0.07
     Cardio
    -0.07
    uco
    -0.07
    Coefficient
    -0.07
    POSITIVE LOGITS
     ban
    0.14
     Ban
    0.12
     banned
    0.12
     banning
    0.10
    Ban
    0.10
     bans
    0.09
    AN
    0.08
     raids
    0.08
     nam
    0.07
     outlaw
    0.07
    Act Density 0.007%

    No Known Activations