INDEX
Explanations
phrases or words related to ignoring or dismissal
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
757
+0.12
0.4%
1548
+0.12
0.4%
197
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
757
+0.12
0.03
197
+0.12
0.03
1548
+0.12
0.02
Negative Logits
IPMENT
-0.48
BaseAdapter
-0.41
she
-0.41
Statistiche
-0.41
Mu
-0.40
KOK
-0.40
ื้น
-0.40
<bos>
-0.39
VP
-0.39
lease
-0.39
POSITIVE LOGITS
Ignoring
0.88
ignore
0.85
Ignore
0.82
ignore
0.82
poff
0.81
ignoring
0.81
ignored
0.80
ignores
0.80
ecru
0.78
madonna
0.78
Activations Density 0.098%