INDEX
Explanations
text related to political and social issues, legislation, and rights
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.32
1.1%
2019
+0.19
0.7%
964
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1959
+0.32
0.17
50
+0.19
0.15
2019
+0.13
0.15
Negative Logits
<bos>
-1.46
ⓧ
-0.65
ladri
-0.63
<?
-0.63
sentimenti
-0.61
otides
-0.61
sogni
-0.61
ostante
-0.60
dici
-0.60
braccia
-0.60
POSITIVE LOGITS
Juf
0.93
unspeak
0.89
Illus
0.85
ineffec
0.82
unlaw
0.82
Intere
0.82
Expt
0.82
luxuriant
0.82
impractica
0.82
Gorb
0.82
Activations Density 3.216%