INDEX
Explanations
references to government policies and reports
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1741
+0.28
1.0%
50
+0.23
0.8%
227
+0.17
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
16
+0.28
0.09
50
+0.23
0.06
1967
+0.17
0.06
Negative Logits
lmfao
-0.81
anteriorly
-0.71
purée
-0.71
overcrow
-0.69
teljesen
-0.67
erős
-0.67
herido
-0.65
všem
-0.64
subgoals
-0.63
inkább
-0.63
POSITIVE LOGITS
Chá
1.15
Nö
1.12
préc
1.02
Jä
1.02
Bâ
1.01
gouver
1.00
Mâ
0.99
Præ
0.99
pié
0.98
Câ
0.98
Activations Density 0.391%