INDEX
Explanations
tweets or mentions of Twitter activity
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1296
+0.14
0.5%
1096
+0.11
0.4%
1023
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1296
+0.14
0.04
1601
+0.11
0.03
201
+0.10
0.02
Negative Logits
<?
-0.62
ļ
-0.59
šķ
-0.57
izvē
-0.55
يتيمه
-0.55
āci
-0.53
specialmente
-0.52
pertanto
-0.52
clc
-0.51
Ainda
-0.51
POSITIVE LOGITS
tweet
1.44
tweets
1.31
tweeting
1.24
tweeted
1.22
tweet
1.19
Tweet
1.18
Tweets
1.09
tweets
1.04
1.01
0.98
Activations Density 0.038%