INDEX
Explanations
phrases related to questioning established beliefs or practices
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1380
+0.12
0.3%
509
+0.11
0.3%
946
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
509
+0.12
0.03
1380
+0.11
0.02
81
+0.11
0.01
Negative Logits
mondeo
-0.98
jetta
-0.96
broderie
-0.91
eyel
-0.91
desir
-0.91
philips
-0.90
bandeau
-0.87
drap
-0.86
tricot
-0.85
skoda
-0.85
POSITIVE LOGITS
xffffffff
0.64
xffffff
0.57
xffff
0.53
=>'
0.52
+='
0.50
xFFFFFF
0.49
[`
0.49
}/*
0.49
},{
0.48
ECONDS
0.47
Activations Density 0.128%