INDEX
Explanations
words related to locations and historical events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
687
+0.15
0.5%
1404
+0.12
0.4%
331
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
687
+0.15
0.06
1404
+0.12
0.05
331
+0.10
0.04
Negative Logits
YMS
-0.60
gameserver
-0.55
دیکھیے
-0.55
ggars
-0.53
inappropriés
-0.51
exitRule
-0.50
URLEncoder
-0.50
MMMM
-0.49
AnchorTagHelper
-0.49
pidou
-0.49
POSITIVE LOGITS
shenan
1.14
intersper
0.99
encomp
0.99
milf
0.99
unspeak
0.94
hairc
0.92
impra
0.91
increa
0.89
unwarran
0.89
disagre
0.87
Activations Density 0.160%