INDEX
Explanations
phrases related to actions or capabilities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1387
+0.11
0.3%
1505
+0.10
0.3%
1634
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1413
+0.11
0.04
1823
+0.10
0.04
1387
+0.10
0.04
Negative Logits
chiha
-0.54
FlatAppearance
-0.52
Bahía
-0.51
opc
-0.50
patata
-0.50
urg
-0.49
ждую
-0.49
ecuador
-0.48
€)
-0.48
Viana
-0.48
POSITIVE LOGITS
pintadas
0.54
safely
0.53
decoradas
0.52
contribué
0.50
calyx
0.49
<bos>
0.49
pouvaient
0.48
незавершена
0.47
konden
0.47
Савезне
0.46
Activations Density 0.609%