INDEX
Explanations
courts, judgments, and legal cases
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1499
+0.13
0.4%
453
+0.10
0.3%
766
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1499
+0.13
0.04
1585
+0.10
0.03
2043
+0.09
0.03
Negative Logits
nutella
-0.92
tupperware
-0.86
purée
-0.79
polenta
-0.76
bonbons
-0.68
hairc
-0.68
Spinach
-0.67
Może
-0.67
sprigs
-0.65
cushi
-0.64
POSITIVE LOGITS
kasa
1.04
Amerik
0.94
silikon
0.93
Teks
0.91
induk
0.88
panik
0.87
duk
0.87
Ukraina
0.85
kaos
0.84
akut
0.84
Activations Density 0.135%