INDEX
Explanations
names of individuals, particularly doctors and people mentioned in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.22
0.7%
227
+0.12
0.4%
2034
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.22
0.07
227
+0.12
0.05
1097
+0.10
0.05
Negative Logits
increa
-2.13
affor
-2.10
desir
-2.09
inev
-2.06
volunte
-2.04
unden
-1.99
guarante
-1.99
fuf
-1.98
emphat
-1.98
purcha
-1.97
POSITIVE LOGITS
.
1.15
;
0.97
。
0.93
).
0.88
,
0.88
."
0.83
!
0.82
:
0.81
.)
0.81
);
0.81
Activations Density 0.201%