INDEX
    Explanations

    negative sentiment/arguments

    This neuron detects passive‐voice “is used by” constructions emphasizing legitimate or dual usage (e.g. “is also used by citizens”).

    New Auto-Interp
    Negative Logits
    -states
    -0.06
     harmful
    -0.06
    -0.06
    inds
    -0.06
    Nine
    -0.06
     riches
    -0.06
    .assertAlmostEqual
    -0.06
    303
    -0.06
    ircular
    -0.06
     Kath
    -0.06
    POSITIVE LOGITS
    -part
    0.06
    анк
    0.06
     crunchy
    0.06
     vile
    0.06
     compounded
    0.06
    iflower
    0.06
    0.06
     butcher
    0.06
    Spacer
    0.06
     tucked
    0.06
    Act Density 0.155%

    No Known Activations