INDEX
Explanations
verbs or verb phrases related to performing actions or activities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1671
+0.14
0.5%
1256
+0.10
0.4%
1742
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1671
+0.14
0.05
331
+0.10
0.04
1256
+0.10
0.03
Negative Logits
€€
-0.73
sizePolicy
-0.65
Zgod
-0.61
cassert
-0.61
fcntl
-0.60
pymysql
-0.57
smtplib
-0.57
Въ
-0.56
stdarg
-0.55
hashlib
-0.54
POSITIVE LOGITS
intersper
1.08
uninten
0.96
shenan
0.92
maneu
0.92
Here
0.88
Here
0.85
affor
0.85
scrat
0.83
increa
0.83
fortn
0.83
Activations Density 0.055%