INDEX
Explanations
links to news articles or social media posts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.23
0.7%
1177
+0.14
0.4%
876
+0.14
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.23
0.03
1343
+0.14
0.03
1786
+0.14
0.02
Negative Logits
unspeak
-1.59
vainly
-1.48
impelled
-1.32
apprehen
-1.30
gaily
-1.23
indescri
-1.21
ineffec
-1.15
mischie
-1.12
tolerably
-1.11
nobly
-1.10
POSITIVE LOGITS
tass
1.29
inol
1.17
alkoh
1.16
sappi
1.13
abbra
1.11
solidar
1.10
territoriale
1.09
utop
1.08
kosme
1.08
pittores
1.06
Activations Density 0.081%