INDEX
Explanations
This neuron responds to mentions of people involved in wrongdoing (like “victim,” “perpetrator,” “person responsible”) and the consequences or outcomes of harmful actions.
New Auto-Interp
Negative Logits
GameController
-0.06
Gotham
-0.06
Carousel
-0.06
चरण
-0.06
ebay
-0.06
jugg
-0.06
Titanic
-0.06
ández
-0.06
南
-0.06
-0.06
POSITIVE LOGITS
dequeue
0.06
люч
0.06
.defer
0.06
creditor
0.06
bei
0.06
zav
0.06
-navbar
0.06
).(
0.06
duy
0.06
Opp
0.06
Activations Density 0.029%