INDEX
Explanations
The neuron strongly activates on tokens related to torture, mutilation, and other forms of extreme violence.
New Auto-Interp
Negative Logits
itudes
-0.06
眼
-0.06
reminiscent
-0.06
niž
-0.06
pragmatic
-0.06
फर
-0.06
ประเทศไทย
-0.06
.radians
-0.06
ільш
-0.06
azi
-0.06
POSITIVE LOGITS
torture
0.13
tortured
0.11
Tort
0.08
Async
0.07
Palace
0.07
Whole
0.07
orta
0.07
Cort
0.07
Circuit
0.07
sous
0.06
Activations Density 0.003%