INDEX
Explanations
suffering, evil
The neuron activates on words and phrases referring to human suffering, pain, loss, or injustice.
New Auto-Interp
Negative Logits
タ
-0.07
段
-0.06
belle
-0.06
find
-0.06
ided
-0.06
лор
-0.06
.met
-0.06
.streaming
-0.06
intend
-0.06
podstat
-0.06
POSITIVE LOGITS
fellow
0.07
variably
0.07
DeltaTime
0.06
gerek
0.06
реєстра
0.06
ظٹ
0.06
Neg
0.06
流量
0.06
.addActionListener
0.06
selfie
0.06
Activations Density 0.006%