INDEX
Explanations
words related to customer feedback about websites or products
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
1.0%
1480
+0.10
0.4%
1548
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1511
+0.23
-0.01
876
+0.10
-0.12
131
+0.10
0.06
Negative Logits
<bos>
-3.70
Distrikt
-0.96
,
-0.94
RuleContext
-0.93
<th>
-0.92
util
-0.91
/**
-0.90
—
-0.89
/**
-0.89
//
-0.89
POSITIVE LOGITS
reluct
3.61
affor
3.49
increa
3.48
shenan
3.48
maneu
3.41
disagre
3.41
impra
3.33
depic
3.29
unlaw
3.24
unwarran
3.18
Activations Density 6.964%