INDEX
Explanations
mentions of legal and organizational structures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
204
+0.22
0.8%
1092
+0.19
0.7%
1044
+0.17
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
204
+0.22
0.03
1092
+0.19
0.03
1044
+0.17
0.03
Negative Logits
con
-0.60
“
-0.60
‘
-0.59
em
-0.59
&
-0.59
char
-0.58
ce
-0.58
Пе
-0.58
de
-0.57
Mi
-0.57
POSITIVE LOGITS
excru
1.65
ftu
1.62
!...
1.61
?...
1.60
fatis
1.58
perfet
1.57
inev
1.57
tranf
1.54
napoli
1.54
lidl
1.53
Activations Density 0.087%