INDEX
Explanations
Twitter usernames and related text content
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.21
0.6%
674
+0.10
0.3%
876
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.21
0.04
924
+0.10
0.03
99
+0.09
0.02
Negative Logits
gaily
-1.06
unspeak
-1.01
endeavouring
-0.95
impelled
-0.92
reconno
-0.90
intrigu
-0.87
vainly
-0.87
tolerably
-0.83
apprehen
-0.83
unceasing
-0.81
POSITIVE LOGITS
Wikiquote
0.67
soggior
0.64
ostante
0.61
garantis
0.60
MENAFN
0.56
Wikisource
0.56
incontr
0.56
affez
0.56
auguri
0.56
sedia
0.56
Activations Density 0.054%