INDEX
Explanations
instances of phrases related to a specific concept or argument present in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.09
0.3%
316
+0.09
0.2%
382
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
236
+0.09
0.03
576
+0.09
0.02
1750
+0.07
0.02
Negative Logits
Certific
-0.53
Mó
-0.48
oncesto
-0.47
fetchall
-0.47
Vegeu
-0.46
AllowUser
-0.46
EFAULT
-0.45
Comun
-0.44
Rujuakan
-0.44
Nuestra
-0.44
POSITIVE LOGITS
indestru
1.26
shenan
1.19
caprice
1.14
reluct
1.14
milf
1.10
excru
1.08
intersper
1.08
unce
1.08
perfet
1.07
unden
1.07
Activations Density 0.145%