INDEX
Explanations
mention of health insurance related terms and policies, particularly around enrollment and coverage
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.14
0.4%
1166
+0.10
0.3%
964
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.14
0.08
1553
+0.10
0.08
942
+0.09
0.07
Negative Logits
encomp
-1.51
desir
-1.50
depic
-1.49
effe
-1.41
guarante
-1.39
inev
-1.39
perfet
-1.38
suscep
-1.37
eyel
-1.37
unden
-1.37
POSITIVE LOGITS
despite
0.70
themselves
0.70
regardless
0.63
while
0.61
because
0.61
either
0.59
unless
0.58
due
0.58
Their
0.58
during
0.57
Activations Density 0.685%