INDEX
Explanations
user comments and interactions on social media platforms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.14
0.5%
1961
+0.13
0.5%
1077
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.14
0.05
1978
+0.13
0.04
1961
+0.12
0.02
Negative Logits
makro
-1.75
utop
-1.58
moza
-1.57
gesta
-1.47
hek
-1.47
plak
-1.47
solidar
-1.47
gero
-1.47
elek
-1.47
adal
-1.44
POSITIVE LOGITS
pamph
1.84
unwarran
1.78
unlaw
1.72
tolerably
1.69
impractica
1.64
ecru
1.62
hairc
1.62
Ename
1.60
disagre
1.59
liberality
1.59
Activations Density 0.419%