INDEX
Explanations
Hallucinogens and psychedelics
content related to categories of unsafe and inappropriate material, particularly concerning sexual content and drug references.
The neuron activates on mentions of hallucinogenic drug terminology.
New Auto-Interp
Negative Logits
lint
-0.08
wires
-0.07
функци
-0.07
logout
-0.07
р
-0.07
Manager
-0.07
์)
-0.06
"In
-0.06
ツ
-0.06
rikes
-0.06
POSITIVE LOGITS
하는데
0.07
psychedelic
0.06
(ARG
0.06
els
0.06
tuned
0.06
offen
0.06
umbling
0.06
(sent
0.06
Qing
0.06
görül
0.06
Activations Density 0.002%