INDEX
Explanations
terms related to philosophical and legal theoretical concepts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1871
+0.10
0.3%
513
+0.10
0.3%
1654
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.10
0.05
1654
+0.10
0.04
1363
+0.09
0.03
Negative Logits
<bos>
-1.12
s
-0.60
(
-0.60
g
-0.57
e
-0.57
t
-0.56
and
-0.56
Sy
-0.56
↵↵
-0.56
c
-0.56
POSITIVE LOGITS
meis
1.85
fatis
1.82
vns
1.74
paff
1.73
vne
1.68
fua
1.64
marte
1.64
ftu
1.64
fta
1.63
waer
1.63
Activations Density 0.209%