INDEX
Explanations
words related to security and hierarchy
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.11
0.3%
1839
+0.09
0.3%
1535
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
919
+0.11
0.04
300
+0.09
0.07
124
+0.08
0.05
Negative Logits
BIBSYS
-0.77
braith
-0.72
userSchema
-0.66
تضيفلها
-0.65
ProductService
-0.61
strto
-0.61
ulaski
-0.59
ApiService
-0.58
eclamp
-0.58
astricht
-0.58
POSITIVE LOGITS
tldr
0.80
viendra
0.74
prendra
0.59
eût
0.58
Ename
0.58
ouvre
0.57
Derp
0.57
aspect
0.56
Yeet
0.55
ferait
0.55
Activations Density 0.468%