INDEX
Explanations
mentions of social concepts like social media, social service, and social inclusion
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.20
1.3%
1870
+0.13
0.9%
1828
+0.10
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1125
+0.20
0.05
501
+0.13
0.04
1624
+0.10
0.05
Negative Logits
<bos>
-3.35
ⓧ
-0.86
/*++
-0.76
<?
-0.76
enumerate
-0.72
asked
-0.70
/**
-0.68
won
-0.68
for
-0.68
/**
-0.68
POSITIVE LOGITS
Augu
2.03
aen
1.96
fta
1.90
Juf
1.87
affor
1.87
ftu
1.84
ftre
1.80
bandung
1.79
increa
1.75
Minang
1.75
Activations Density 0.090%