INDEX
Explanations
This neuron fires whenever the word “slave” appears in the text.
New Auto-Interp
Negative Logits
yazar
-0.08
imuth
-0.08
collided
-0.08
eldig
-0.07
.point
-0.07
emitted
-0.07
Mood
-0.07
.console
-0.07
Booking
-0.07
Morrison
-0.06
POSITIVE LOGITS
slave
0.13
slaves
0.13
Slave
0.12
Slave
0.09
slave
0.08
slavery
0.08
奴
0.08
py
0.07
раб
0.07
enslaved
0.07
Activations Density 0.006%