INDEX
Explanations
phrases starting with 'with'
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1145
+0.11
0.3%
554
+0.09
0.3%
971
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1081
+0.11
0.03
1166
+0.09
0.03
16
+0.09
0.03
Negative Logits
CÁ
-0.53
ynb
-0.48
Amicalement
-0.46
học
-0.46
làm
-0.46
Giao
-0.45
Quy
-0.44
Nhi
-0.43
Ngh
-0.43
endosi
-0.43
POSITIVE LOGITS
territo
0.70
SUDOC
0.69
Oester
0.64
REACTOR
0.62
CONDUIT
0.62
maer
0.61
ohr
0.61
aen
0.60
haviour
0.60
zoll
0.60
Activations Density 0.118%