INDEX
Explanations
phrases indicating a problem or issue being overlooked or ignored
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.27
0.9%
1343
+0.10
0.3%
1639
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1525
+0.27
0.04
1171
+0.10
0.03
533
+0.08
0.04
Negative Logits
<bos>
-1.64
intersper
-0.97
apprehen
-0.84
disbur
-0.79
vainly
-0.77
attemp
-0.75
renounced
-0.74
lovel
-0.74
reconno
-0.73
interposed
-0.72
POSITIVE LOGITS
soggior
1.15
cavallo
1.02
paillettes
1.01
bicic
0.96
cioc
0.94
ristor
0.94
broderie
0.94
palio
0.93
frambo
0.92
ویکیپدیای
0.92
Activations Density 0.565%