INDEX
Explanations
topics related to controversial issues, beginner-friendly documentation, persuasive arguments, fragility and smallness, awe, interconnectedness, and commonality
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.11
0.3%
1042
+0.09
0.2%
208
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1466
+0.11
0.03
178
+0.09
0.03
598
+0.07
0.03
Negative Logits
Kör
-0.74
Enlaces
-0.74
Palmar
-0.69
Pä
-0.66
كومونز
-0.66
TypedValue
-0.65
Trả
-0.65
insuffisamment
-0.62
cuidado
-0.61
Fö
-0.60
POSITIVE LOGITS
increa
1.47
thut
1.46
fta
1.43
scrat
1.42
reft
1.42
ftu
1.42
hairc
1.41
depic
1.41
unve
1.39
snoopy
1.39
Activations Density 0.107%