INDEX
Explanations
terms related to legal and social issues, potentially focusing on legal applications and implications
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
752
+0.09
0.2%
738
+0.09
0.2%
420
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1885
+0.09
0.03
396
+0.09
0.02
752
+0.08
0.03
Negative Logits
<bos>
-0.96
upported
-0.49
ostasis
-0.45
InjectAttribute
-0.45
šech
-0.45
pisze
-0.44
poved
-0.43
miary
-0.43
<_>
-0.43
раздо
-0.43
POSITIVE LOGITS
lemp
0.73
Timp
0.68
Mâ
0.67
peculi
0.66
democra
0.65
mamp
0.64
kela
0.64
Kela
0.63
mavi
0.62
ristor
0.60
Activations Density 0.242%