INDEX
Explanations
words related to empathy and sympathy
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
197
+0.12
0.4%
1363
+0.11
0.4%
1129
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.12
0.04
197
+0.11
0.03
1601
+0.11
0.03
Negative Logits
disagre
-0.85
accla
-0.81
gmbh
-0.81
apprehen
-0.81
dises
-0.78
indestru
-0.77
affor
-0.77
excru
-0.77
reluct
-0.76
Mua
-0.76
POSITIVE LOGITS
compassion
0.95
sympathy
0.91
compassionate
0.83
Compassion
0.79
empathy
0.78
Compassion
0.77
mercy
0.76
sympathetic
0.70
kindness
0.70
empathetic
0.68
Activations Density 0.174%