INDEX
Explanations
words related to legal terms and locations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1350
+0.17
0.7%
805
+0.16
0.7%
397
+0.16
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.17
0.06
1637
+0.16
0.03
805
+0.16
0.04
Negative Logits
twimg
-0.64
kasa
-0.52
ed
-0.51
principalTable
-0.50
makeStyles
-0.48
yadi
-0.48
moud
-0.48
|()
-0.48
Maulana
-0.48
harap
-0.48
POSITIVE LOGITS
fath
0.94
versace
0.94
embra
0.92
milf
0.92
Euph
0.91
peppa
0.89
madonna
0.88
verona
0.87
archivio
0.86
Pamph
0.86
Activations Density 0.235%