INDEX
Explanations
phrases related to personal and physical actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
394
+0.22
0.7%
50
+0.17
0.5%
453
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
394
+0.22
0.07
658
+0.17
0.08
453
+0.13
0.07
Negative Logits
pama
-0.92
guma
-0.86
<bos>
-0.84
susun
-0.82
ilang
-0.79
katun
-0.77
tanong
-0.76
maging
-0.75
membrance
-0.74
pinak
-0.74
POSITIVE LOGITS
pregn
0.99
reluct
0.98
himself
0.98
himself
0.97
disreg
0.96
peppa
0.95
michelin
0.93
unden
0.92
intermitt
0.91
impra
0.89
Activations Density 0.687%