INDEX
Explanations
mentions of the social media platform Twitter
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1023
+0.14
0.5%
501
+0.10
0.4%
650
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1023
+0.14
0.04
650
+0.10
0.03
613
+0.10
0.04
Negative Logits
izvē
-0.70
Ainda
-0.62
vairāk
-0.60
يتيمه
-0.59
ỡng
-0.57
<?
-0.56
DebuggerStep
-0.56
IActionResult
-0.56
BindingResult
-0.56
EndInit
-0.55
POSITIVE LOGITS
1.27
1.27
1.18
1.12
tweeting
1.09
tweets
1.07
Tweets
1.03
twit
1.02
tweet
1.00
0.95
Activations Density 0.037%