INDEX
Explanations
key phrases related to social, political, and academic discussions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.10
0.3%
1978
+0.08
0.2%
994
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.10
0.06
2010
+0.08
0.05
1657
+0.08
0.04
Negative Logits
Intere
-1.15
depic
-1.14
Confe
-1.10
volunte
-1.09
reluct
-1.09
Manufact
-1.08
timately
-1.07
inev
-1.06
encomp
-1.05
guarante
-1.05
POSITIVE LOGITS
into
0.92
<bos>
0.83
onto
0.82
beginnetje
0.79
RegressionTest
0.71
Paglinawan
0.69
into
0.69
FunctionFlags
0.68
MigrationBuilder
0.67
velkommen
0.66
Activations Density 0.417%