INDEX
Explanations
mentions of welfare programs or related social assistance initiatives
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1937
+0.13
0.6%
67
+0.13
0.6%
411
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
281
+0.13
0.02
1937
+0.13
0.02
67
+0.12
0.02
Negative Logits
ihnachten
-0.55
Pand
-0.54
Pand
-0.54
Montague
-0.49
Lichten
-0.47
McInt
-0.47
Morg
-0.46
Vat
-0.46
Schn
-0.46
Schlu
-0.46
POSITIVE LOGITS
welfare
1.14
Welfare
1.14
welfare
1.10
Welfare
1.07
WELFARE
0.96
elfare
0.92
Wells
0.76
Wel
0.73
Wells
0.69
wel
0.68
Activations Density 0.167%